一条单线安全强盗多布利乐观战略 (A Doubly Optimistic Strategy for Safe Linear Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 线性的 · 可辨认的 · Notability · 估计/估计量 ·

2022 年 9 月 27 日

A Doubly Optimistic Strategy for Safe Linear Bandits

翻译：一条单线安全强盗多布利乐观战略

Tianrui Chen,Aditya Gangrade,Venkatesh Saligrama

We propose a \underline{d}oubly \underline{o}ptimistic strategy for the \underline{s}afe-\underline{l}inear-\underline{b}andit problem, DOSLB. The safe linear bandit problem is to optimise an unknown linear reward whilst satisfying unknown round-wise safety constraints on actions, using stochastic bandit feedback of reward and safety-risks of actions. In contrast to prior work on aggregated resource constraints, our formulation explicitly demands control on roundwise safety risks. Unlike existing optimistic-pessimistic paradigms for safe bandits, DOSLB exercises supreme optimism, using optimistic estimates of reward and safety scores to select actions. Yet, and surprisingly, we show that DOSLB rarely takes risky actions, and obtains $\tilde{O}(d \sqrt{T})$ regret, where our notion of regret accounts for both inefficiency and lack of safety of actions. Specialising to polytopal domains, we first notably show that the $\sqrt{T}$-regret bound cannot be improved even with large gaps, and then identify a slackened notion of regret for which we show tight instance-dependent $O(\log^2 T)$ bounds. We further argue that in such domains, the number of times an overly risky action is played is also bounded as $O(\log^2T)$.

翻译：我们为下线{底线{底线{底线}{底线{底线{底线{底线{底线{底线}提出一个战略,以优化一个未知线性奖赏,同时利用奖赏和安全风险的随机回馈,满足行动方面未知的圆向安全限制。与以往关于总体资源限制的工作相比,我们的配方明确要求对圆形安全风险进行控制。与目前对安全匪徒的乐观和悲观范式不同,DOSLB使用对奖赏和安全分数的乐观估计来选择行动DOSLB。然而,令人惊讶的是,我们显示DOSLB很少采取冒险行动,并获得对行动进行无风险的圆向安全限制,同时使用奖赏和安全风险的随机的匪帮反馈。我们对于效率低下和缺乏安全性的行动的遗憾概念,我们特别针对多式区域,我们首先明显显示,美元=基调值-平差值-平分比值-平底线上,我们没有多少的硬度概念。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

蛋白激酶LIMK1活性在小鼠卵母细胞染色体分离过程中的作用和分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

浸入边界法的高效稳定数值格式

国家自然科学基金

0+阅读 · 2012年12月31日

基于DSP的LDoS/LDDoS攻击建模、检测和过滤方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

非接触式同步电机转子励磁新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带频谱压缩感知与自适应分配算法

国家自然科学基金

0+阅读 · 2011年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Arxiv

0+阅读 · 2022年11月3日

Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms

Arxiv

0+阅读 · 2022年11月3日

Liability regimes in the age of AI: a use-case driven analysis of the burden of proof

Arxiv

0+阅读 · 2022年11月3日

Optimal Algorithms for Stochastic Complementary Composite Minimization

Arxiv

0+阅读 · 2022年11月3日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2022年11月3日

Proximal Subgradient Norm Minimization of ISTA and FISTA

Arxiv

0+阅读 · 2022年11月3日

Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

Arxiv

0+阅读 · 2022年11月1日

Statistical Learning from Biased Training Samples

Arxiv

0+阅读 · 2022年11月1日

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Arxiv

0+阅读 · 2022年11月1日

A combination technique for optimal control problems constrained by random PDEs

Arxiv

0+阅读 · 2022年11月1日

VIP会员

文章信息

相关主题

赌博机/老虎机

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Arxiv

0+阅读 · 2022年11月3日

Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms

Arxiv

0+阅读 · 2022年11月3日

Liability regimes in the age of AI: a use-case driven analysis of the burden of proof

Arxiv

0+阅读 · 2022年11月3日

Optimal Algorithms for Stochastic Complementary Composite Minimization

Arxiv

0+阅读 · 2022年11月3日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2022年11月3日

Proximal Subgradient Norm Minimization of ISTA and FISTA

Arxiv

0+阅读 · 2022年11月3日

Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

Arxiv

0+阅读 · 2022年11月1日

Statistical Learning from Biased Training Samples

Arxiv

0+阅读 · 2022年11月1日

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Arxiv

0+阅读 · 2022年11月1日

A combination technique for optimal control problems constrained by random PDEs

Arxiv

0+阅读 · 2022年11月1日

相关基金

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

蛋白激酶LIMK1活性在小鼠卵母细胞染色体分离过程中的作用和分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

浸入边界法的高效稳定数值格式

国家自然科学基金

0+阅读 · 2012年12月31日

基于DSP的LDoS/LDDoS攻击建模、检测和过滤方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

非接触式同步电机转子励磁新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带频谱压缩感知与自适应分配算法

国家自然科学基金

0+阅读 · 2011年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员