具有差异- 依赖性 Regret 界圈的多装甲强势多装甲土匪等级 (Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 方差 · 稳健性 · 回合 · ARM ·

2022 年 6 月 14 日

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

翻译：具有差异- 依赖性 Regret 界圈的多装甲强势多装甲土匪等级

Shinji Ito,Taira Tsuchiya,Junya Honda

from arxiv, Accepted for presentation at the 35th Annual Conference on Learning Theory (COLT 2022). Only the extended abstract will appear in the conference proceedings

This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: \Delta_i>0} \frac{\log T}{\Delta_i})$ for suboptimality gap $\Delta_i$ of arm $i$ and time horizon $T$. As Audibert et al. [2007] have shown, however, that the performance can be improved in stochastic environments with low-variance arms. In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of $O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )$ for loss variance $\sigma_i^2$ of arm $i$. In this paper, we propose the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. Additionally, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. The proposed algorithm is based on the follow-the-regularized-leader method and employs adaptive learning rates that depend on the empirical prediction error of the loss, which leads to gap-variance-dependent regret bounds reflecting the variance of the arms.

翻译：本文审视了多臂土匪(MAB)问题, 并提供了一个新的双向双向最佳算法( BOBW), 它在随机和对立的设置中几乎都能发挥最佳效果。但是, 在随机环境中, 一些现有的 BOBW 算法的性能可以改善以差为基础的严格悔分界限 $O (\\\ sum ⁇ i:\ Delta_ i}\ frac=log TunDelta_ i} 用于亚最佳差距 $\ Delta_ i$, 美元美元, 美元美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 。美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 。美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 。

0

相关内容

赌博机/老虎机

赌博机/老虎机

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

浅埋隧道施工引起的爆破地震效应评价体系研究

国家自然科学基金

0+阅读 · 2015年12月31日

MT-InSAR中电离层延迟改正关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

平滑肌肌球蛋白磷酸化调节的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

高Q回音壁光学微谐振腔表面等离激元强耦合模式特性及应用

国家自然科学基金

0+阅读 · 2014年12月31日

红闪（sprite）对地闪的非线性响应与干涉效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

hOGG1基因表观调控异常在非小细胞肺癌中作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

平滑肌L型钙通道在模拟失重大鼠脑动脉血管适应性变化中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

CD73在动脉粥样硬化斑块破裂中的作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

Mather理论与Hamilton系统的不稳定性

国家自然科学基金

0+阅读 · 2008年12月31日

Robustness Implies Generalization via Data-Dependent Generalization Bounds

Arxiv

0+阅读 · 2022年8月3日

A uniform preconditioner for a Newton algorithm for total-variation minimization and minimum-surface problems

Arxiv

0+阅读 · 2022年8月2日

Doubly Robust Estimation of Local Average Treatment Effects Using Inverse Probability Weighted Regression Adjustment

Arxiv

0+阅读 · 2022年8月2日

Bias Reduction for Sum Estimation

Arxiv

0+阅读 · 2022年8月2日

Numerical identification of initial temperatures in heat equation with dynamic boundary conditions

Arxiv

0+阅读 · 2022年8月1日

On the impact of serial dependence on penalized regression methods

Arxiv

0+阅读 · 2022年8月1日

Statistical Methods for Selective Biomarker Testing

Arxiv

0+阅读 · 2022年7月31日

Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers

Arxiv

0+阅读 · 2022年7月29日

Optimistic and Topological Value Iteration for Simple Stochastic Games

Arxiv

0+阅读 · 2022年7月29日

Distributed Stochastic Bandit Learning with Context Distributions

Arxiv

0+阅读 · 2022年7月28日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

相关论文

Robustness Implies Generalization via Data-Dependent Generalization Bounds

Arxiv

0+阅读 · 2022年8月3日

A uniform preconditioner for a Newton algorithm for total-variation minimization and minimum-surface problems

Arxiv

0+阅读 · 2022年8月2日

Doubly Robust Estimation of Local Average Treatment Effects Using Inverse Probability Weighted Regression Adjustment

Arxiv

0+阅读 · 2022年8月2日

Bias Reduction for Sum Estimation

Arxiv

0+阅读 · 2022年8月2日

Numerical identification of initial temperatures in heat equation with dynamic boundary conditions

Arxiv

0+阅读 · 2022年8月1日

On the impact of serial dependence on penalized regression methods

Arxiv

0+阅读 · 2022年8月1日

Statistical Methods for Selective Biomarker Testing

Arxiv

0+阅读 · 2022年7月31日

Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers

Arxiv

0+阅读 · 2022年7月29日

Optimistic and Topological Value Iteration for Simple Stochastic Games

Arxiv

0+阅读 · 2022年7月29日

Distributed Stochastic Bandit Learning with Context Distributions

Arxiv

0+阅读 · 2022年7月28日

相关基金

浅埋隧道施工引起的爆破地震效应评价体系研究

国家自然科学基金

0+阅读 · 2015年12月31日

MT-InSAR中电离层延迟改正关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

平滑肌肌球蛋白磷酸化调节的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

高Q回音壁光学微谐振腔表面等离激元强耦合模式特性及应用

国家自然科学基金

0+阅读 · 2014年12月31日

红闪（sprite）对地闪的非线性响应与干涉效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

hOGG1基因表观调控异常在非小细胞肺癌中作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

平滑肌L型钙通道在模拟失重大鼠脑动脉血管适应性变化中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

CD73在动脉粥样硬化斑块破裂中的作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

Mather理论与Hamilton系统的不稳定性

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员