具有有限适应和学习分配最佳设计的线形强盗 (Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design) - 专知论文

会员服务 ·

0

Bandits · 优化器 · 批量学习 · 线性的 · 赌博机/老虎机 ·

2021 年 4 月 23 日

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design

翻译：具有有限适应和学习分配最佳设计的线形强盗

Yufei Ruan,Jiaqi Yang,Yuan Zhou

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online active learning. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in $d$-dimensional linear contextual bandits, the learner needs $O(d \log d \log T)$ policy switches to achieve the minimax-optimal regret, and this is optimal up to $\mathrm{poly}(\log d, \log \log T)$ factors; for stochastic context vectors, even in the more restricted batch learning model, only $O(\log \log T)$ batches are needed to achieve the optimal regret. Together with the known results in literature, our results present a complete picture about the adaptivity constraints in linear contextual bandits. Along the way, we propose the distributional optimal design, a natural extension of the optimal experiment design, and provide a both statistically and computationally efficient learning algorithm for the problem, which may be of independent interest.

翻译：基于大规模学习等实际需要,我们研究适应性限制对线性背景强盗的影响,这是在线积极学习的一个中心问题。我们考虑文献中两种流行的有限适应性模式:批量学习和罕见的政策开关。我们表明,当背景矢量以美元维度线性背景强盗为对抗性选择时,学习者需要美元(d)\log d\log T)的政策开关以实现最小最大程度的负鼠悔,这是最优到$\mathrm{poly}(log d,\log\log T)的因子;对于随机环境矢量,即使是在较受限制的批量学习模式中,只需要美元(log\log T)来达到最佳程度的遗憾。与已知的文献结果一起,我们的结果完整地展示了线性背景强盗的适应性限制。此外,我们提出了分配性最佳设计、最佳实验设计的自然延伸,并为问题提供统计和计算效率高的算法,这或许是独立的兴趣。

0

相关内容

Bandits

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Distributionally Robust Optimization with Markovian Data

Arxiv

0+阅读 · 2021年6月12日

Learning the optimal regularizer for inverse problems

Learning the optimal regularizer for inverse problems

Arxiv

0+阅读 · 2021年6月11日

A Distribution-Dependent Analysis of Meta-Learning

A Distribution-Dependent Analysis of Meta-Learning

Arxiv

0+阅读 · 2021年6月11日

Quantile Bandits for Best Arms Identification

Arxiv

0+阅读 · 2021年6月11日

DORO: Distributional and Outlier Robust Optimization

Arxiv

0+阅读 · 2021年6月11日

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

Arxiv

0+阅读 · 2021年6月6日

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年6月3日

Asymptotically Optimal Bandits under Weighted Information

Arxiv

0+阅读 · 2021年5月28日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Distributionally Robust Optimization with Markovian Data

Arxiv

0+阅读 · 2021年6月12日

Learning the optimal regularizer for inverse problems

Learning the optimal regularizer for inverse problems

Arxiv

0+阅读 · 2021年6月11日

A Distribution-Dependent Analysis of Meta-Learning

A Distribution-Dependent Analysis of Meta-Learning

Arxiv

0+阅读 · 2021年6月11日

Quantile Bandits for Best Arms Identification

Arxiv

0+阅读 · 2021年6月11日

DORO: Distributional and Outlier Robust Optimization

Arxiv

0+阅读 · 2021年6月11日

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

Arxiv

0+阅读 · 2021年6月6日

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年6月3日

Asymptotically Optimal Bandits under Weighted Information

Arxiv

0+阅读 · 2021年5月28日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员