简单和最佳的在线学习政策设计,安全应对重成风险 (A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk) - 专知论文

会员服务 ·

0

优化器 · 赌博机/老虎机 · Learning · SimPLe · 上置信界限 ·

2022 年 6 月 10 日

A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk

翻译：简单和最佳的在线学习政策设计,安全应对重成风险

David Simchi-Levi,Zeyu Zheng,Feng Zhu

We design simple and optimal policies that ensure safety against heavy-tailed risk in the classical multi-armed bandit problem. Recently, \cite{fan2021fragility} showed that information-theoretically optimized bandit algorithms suffer from serious heavy-tailed risk; that is, the worst-case probability of incurring a linear regret slowly decays at a rate of $1/T$, where $T$ is the time horizon. Inspired by their results, we further show that widely used policies such as the standard Upper Confidence Bound policy and the Thompson Sampling policy also incur heavy-tailed risk; and this heavy-tailed risk actually exists for all "instance-dependent consistent" policies. To ensure safety against such heavy-tailed risk, for the two-armed bandit setting, we provide a simple policy design that (i) has the worst-case optimality for the expected regret at order $\tilde O(\sqrt{T})$ and (ii) has the worst-case tail probability of incurring a linear regret decay at an exponential rate $\exp(-\Omega(\sqrt{T}))$. We further prove that this exponential decaying rate of the tail probability is optimal across all policies that have worst-case optimality for the expected regret. Finally, we improve the policy design and analysis to the general setting with an arbitrary $K$ number of arms. We provide detailed characterization of the tail probability bound for any regret threshold under our policy design. Namely, the worst-case probability of incurring a regret larger than $x$ is upper bounded by $\exp(-\Omega(x/\sqrt{KT}))$. Numerical experiments are conducted to illustrate the theoretical findings. Our results reveal insights on the incompatibility between consistency and light-tailed risk, whereas indicate that worst-case optimality on expected regret and light-tailed risk are compatible.

翻译：我们设计了简单和最佳的政策,以确保安全,防止传统多武装匪徒问题中出现重创风险。最近,\ cite{fan2021flegility} 显示,信息理论优化的土匪算法面临严重严重连锁风险; 也就是说, 引发线性遗憾的最坏的概率以1美元/T美元的速度缓慢衰减, 而美元为时平线。受其结果的启发, 我们进一步显示, 广泛使用的政策, 如标准高信任度政策和汤普森抽样政策也带来严重连锁风险 ; 而对于所有“ 依赖系统稳定” 的政策来说,这种高度连锁的概率风险实际上都存在。为了保证这种严重连锁风险的安全, 我们提供了一个简单的政策设计, (一) 最坏的情景最优化, O(sqrqrt{t} $) 美元和 (ii) 最坏的尾巴的概率最坏的概率是以指数美元/Omregread dreal dreal dreal) 。我们最坏的策略的直为最坏的直为最坏的概率。

0

相关内容

优化器

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

(CexA1-x)2Ti2O7 (A=Y, Gd, Lu; x=0-1)的制备及离子束辐照效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

高光谱分辨率氧气A吸收带地表气压和气溶胶廓线反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

胃蛋白酶在喉咽上皮细胞炎症恶性转化中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

微波焙烧含锗氧化锌烟尘回收锗过程的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA-15a和microRNA-195在电磁辐射致神经元线粒体功能障碍中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于生物检测与成像的AIE型红光纳米材料

国家自然科学基金

0+阅读 · 2012年12月31日

TRPP2-STIM1相互作用：脑缺血再灌注损伤新机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

水-岩相互作用过程的三维实时CT观测及其损伤建模

国家自然科学基金

0+阅读 · 2012年12月31日

自噬在心肌缺血再灌注损伤中的不同作用

国家自然科学基金

0+阅读 · 2011年12月31日

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

Arxiv

0+阅读 · 2022年7月26日

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Arxiv

0+阅读 · 2022年7月25日

Estimating Extreme Value Index by Subsampling for Massive Datasets with Heavy-Tailed Distributions

Arxiv

0+阅读 · 2022年7月25日

Improving Adversarial Robustness via Mutual Information Estimation

Arxiv

0+阅读 · 2022年7月25日

Channel Capacity for Adversaries with Computationally Bounded Observations

Arxiv

0+阅读 · 2022年7月24日

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月22日

Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

Arxiv

0+阅读 · 2022年7月22日

Generalized Identifiability Bounds for Mixture Models with Grouped Samples

Arxiv

0+阅读 · 2022年7月22日

Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

Arxiv

0+阅读 · 2022年7月22日

Optimal Algorithms for Free Order Multiple-Choice Secretary

Arxiv

0+阅读 · 2022年7月21日

VIP会员

文章信息

相关主题

赌博机/老虎机

上置信界限

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

Arxiv

0+阅读 · 2022年7月26日

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Arxiv

0+阅读 · 2022年7月25日

Estimating Extreme Value Index by Subsampling for Massive Datasets with Heavy-Tailed Distributions

Arxiv

0+阅读 · 2022年7月25日

Improving Adversarial Robustness via Mutual Information Estimation

Arxiv

0+阅读 · 2022年7月25日

Channel Capacity for Adversaries with Computationally Bounded Observations

Arxiv

0+阅读 · 2022年7月24日

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月22日

Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

Arxiv

0+阅读 · 2022年7月22日

Generalized Identifiability Bounds for Mixture Models with Grouped Samples

Arxiv

0+阅读 · 2022年7月22日

Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

Arxiv

0+阅读 · 2022年7月22日

Optimal Algorithms for Free Order Multiple-Choice Secretary

Arxiv

0+阅读 · 2022年7月21日

相关基金

(CexA1-x)2Ti2O7 (A=Y, Gd, Lu; x=0-1)的制备及离子束辐照效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

高光谱分辨率氧气A吸收带地表气压和气溶胶廓线反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

胃蛋白酶在喉咽上皮细胞炎症恶性转化中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

微波焙烧含锗氧化锌烟尘回收锗过程的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA-15a和microRNA-195在电磁辐射致神经元线粒体功能障碍中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于生物检测与成像的AIE型红光纳米材料

国家自然科学基金

0+阅读 · 2012年12月31日

TRPP2-STIM1相互作用：脑缺血再灌注损伤新机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

水-岩相互作用过程的三维实时CT观测及其损伤建模

国家自然科学基金

0+阅读 · 2012年12月31日

自噬在心肌缺血再灌注损伤中的不同作用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员