Veratile 裁断强盗:从偏向在线学习最佳双向世界分析 (Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Bandits · 优化器 · INFORMS · Performer ·

2022 年 2 月 14 日

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

翻译：Veratile 裁断强盗:从偏向在线学习最佳双向世界分析

Aadirupa Saha,Pierre Gaillard

We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve many existing results in dueling bandits. In particular, \emph{we give the first best-of-both world result for the dueling bandits regret minimization problem} -- a unified framework that is guaranteed to perform optimally for both stochastic and adversarial preferences simultaneously. Moreover, our algorithm is also the first to achieve an optimal $O(\sum_{i = 1}^K \frac{\log T}{\Delta_i})$ regret bound against the Condorcet-winner benchmark, which scales optimally both in terms of the arm-size $K$ and the instance-specific suboptimality gaps $\{\Delta_i\}_{i = 1}^K$. This resolves the long-standing problem of designing an instancewise gap-dependent order optimal regret algorithm for dueling bandits (with matching lower bounds up to small constant factors). We further justify the robustness of our proposed algorithm by proving its optimal regret rate under adversarially corrupted preferences -- this outperforms the existing state-of-the-art corrupted dueling results by a large margin. In summary, we believe our reduction idea will find a broader scope in solving a diverse class of dueling bandits setting, which are otherwise studied separately from multi-armed bandits with often more complex solutions and worse guarantees. The efficacy of our proposed algorithms is empirically corroborated against the existing dueling bandit methods.

翻译：我们研究的是使用$K美元配制的土匪问题, 研究的是使用$K美元配制的土匪问题。学习者的目标是通过相对偏好在在线顺序上对一对决定点进行查询,通过在线顺序排列方式对信息进行汇总。我们首先建议从任何(一般)配制土匪到多武装土匪进行新的削减, 尽管简单, 这使得我们能够改善在配制土匪方面的许多现有结果。特别是, 我们给配制土匪时的贬低率问题提供了世界第一最佳结果。是一个更宽泛的框架,保证既能同时以最佳的方式执行随机偏差和对抗对立的偏差。此外,我们的算法也是第一个实现最优化的 $(=====1 ⁇ K) 土匪向多武装强盗交配, 以最优的方式衡量现有的美元比额, 以更优的相对差的基比值差距 =1 ⁇ 基。这可以解决长期的马戏法问题, 以最稳的比重的比重法方法来, 将我们最稳的基的比重地研究。

0

相关内容

赌博机/老虎机

赌博机/老虎机

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

ICLR 2020 高质量强化学习论文汇总

ICLR 2020 高质量强化学习论文汇总

极市平台

12+阅读 · 2019年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

矩阵分解问题的优化算法与理论

国家自然科学基金

8+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

大数据环境下融合多源信息的推荐系统关键问题研究

国家自然科学基金

6+阅读 · 2014年12月31日

玉米磷信号途径调控基因序列变异及其与耐低磷性状的关联分析

国家自然科学基金

0+阅读 · 2014年12月31日

矩阵方程秩约束广义最佳逼近理论及应用

国家自然科学基金

1+阅读 · 2013年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

温和条件下甲烷活化与转化的固体核磁共振研究

国家自然科学基金

0+阅读 · 2011年12月31日

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Review of Serial and Parallel Min-Cut/Max-Flow Algorithms for Computer Vision

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

Approximating Persistent Homology for Large Datasets

Arxiv

0+阅读 · 2022年4月19日

The White-Box Adversarial Data Stream Model

Arxiv

0+阅读 · 2022年4月19日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

ICLR 2020 高质量强化学习论文汇总

ICLR 2020 高质量强化学习论文汇总

极市平台

12+阅读 · 2019年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Review of Serial and Parallel Min-Cut/Max-Flow Algorithms for Computer Vision

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

Approximating Persistent Homology for Large Datasets

Arxiv

0+阅读 · 2022年4月19日

The White-Box Adversarial Data Stream Model

Arxiv

0+阅读 · 2022年4月19日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

相关基金

矩阵分解问题的优化算法与理论

国家自然科学基金

8+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

大数据环境下融合多源信息的推荐系统关键问题研究

国家自然科学基金

6+阅读 · 2014年12月31日

玉米磷信号途径调控基因序列变异及其与耐低磷性状的关联分析

国家自然科学基金

0+阅读 · 2014年12月31日

矩阵方程秩约束广义最佳逼近理论及应用

国家自然科学基金

1+阅读 · 2013年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

温和条件下甲烷活化与转化的固体核磁共振研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员