串连大宗强盗团伙:关于线性强匪团伙改换遗憾的案例研究 (Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Bandits · 线性的 · CASE · 基 ·

2022 年 2 月 12 日

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

翻译：串连大宗强盗团伙:关于线性强匪团伙改换遗憾的案例研究

Haipeng Luo,Mengxiao Zhang,Peng Zhao,Zhi-Hua Zhou

We consider the problem of combining and learning over a set of adversarial bandit algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al., 2020a) achieve this goal with a regret overhead of order $\widetilde{O}(\sqrt{MT})$ where $M$ is the number of base algorithms and $T$ is the time horizon. The polynomial dependence on $M$, however, prevents one from applying these algorithms to many applications where $M$ is poly$(T)$ or even larger. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only \emph{logarithmic} dependence on $M$ as long as some conditions are satisfied. As the main example, we apply our recipe to the problem of adversarial linear bandits over a $d$-dimensional $\ell_p$ unit-ball for $p \in (1,2]$. By corralling a large set of $T$ base algorithms, each starting at a different time step, our final algorithm achieves the first optimal switching regret $\widetilde{O}(\sqrt{d S T})$ when competing against a sequence of comparators with $S$ switches (for some known $S$). We further extend our results to linear bandits over a smooth and strongly convex domain as well as unconstrained linear bandits.

翻译：我们考虑在一组对抗性强盗算法中结合和学习的问题,其目的在于适应性地追踪最佳的飞行用量。Agarwal等人(2017年)及其变体(Foster等人,2020年a)的CORAL算法(2017年)及其变体(Foster等人,2020年a)的CORRAAL算法,以令人遗憾的美元顺序($Uberstilde{O}}(sqrt{MT})美元)实现这个目标,美元是基本算法的数量,美元是T$的时段。但是,多价对M$多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价多价。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

可积系统的代数与几何结构

国家自然科学基金

0+阅读 · 2013年12月31日

湘西寒武纪王村化石库(fossil Lagerstatte)的研究

国家自然科学基金

0+阅读 · 2012年12月31日

大波数Helmholtz方程新型、高效积分方程解法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

基于不确定时间序列的Skyline分析的研究

国家自然科学基金

1+阅读 · 2010年12月31日

NOX在UVA诱导的肥大细胞胞浆钙振荡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

Sampling Lovász Local Lemma For General Constraint Satisfaction Solutions In Near-Linear Time

Arxiv

0+阅读 · 2022年4月19日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Arxiv

0+阅读 · 2022年4月19日

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Arxiv

0+阅读 · 2022年4月18日

A New Dynamic Algorithm for Densest Subhypergraphs

Arxiv

0+阅读 · 2022年4月17日

FKreg: A MATLAB toolbox for fast Multivariate Kernel Regression

Arxiv

0+阅读 · 2022年4月16日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

Sampling Lovász Local Lemma For General Constraint Satisfaction Solutions In Near-Linear Time

Arxiv

0+阅读 · 2022年4月19日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Arxiv

0+阅读 · 2022年4月19日

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Arxiv

0+阅读 · 2022年4月18日

A New Dynamic Algorithm for Densest Subhypergraphs

Arxiv

0+阅读 · 2022年4月17日

FKreg: A MATLAB toolbox for fast Multivariate Kernel Regression

Arxiv

0+阅读 · 2022年4月16日

相关基金

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

可积系统的代数与几何结构

国家自然科学基金

0+阅读 · 2013年12月31日

湘西寒武纪王村化石库(fossil Lagerstatte)的研究

国家自然科学基金

0+阅读 · 2012年12月31日

大波数Helmholtz方程新型、高效积分方程解法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

基于不确定时间序列的Skyline分析的研究

国家自然科学基金

1+阅读 · 2010年12月31日

NOX在UVA诱导的肥大细胞胞浆钙振荡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员