关于针对强盗的有限记忆子抽样战略 (On Limited-Memory Subsampling Strategies for Bandits) - 专知论文

会员服务 ·

0

子采样 · 赌博机/老虎机 · Extensibility · 优化器 · surge ·

2021 年 6 月 21 日

On Limited-Memory Subsampling Strategies for Bandits

翻译：关于针对强盗的有限记忆子抽样战略

Dorian Baudry,Yoan Russac,Olivier Cappé

There has been a recent surge of interest in nonparametric bandit algorithms based on subsampling. One drawback however of these approaches is the additional complexity required by random subsampling and the storage of the full history of rewards. Our first contribution is to show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. (2020) under the name of ''last-block subsampling'', is asymptotically optimal in one-parameter exponential families. In addition, we prove that these guarantees also hold when limiting the algorithm memory to a polylogarithmic function of the time horizon. These findings open up new perspectives, in particular for non-stationary scenarios in which the arm distributions evolve over time. We propose a variant of the algorithm in which only the most recent observations are used for subsampling, achieving optimal regret guarantees under the assumption of a known number of abrupt changes. Extensive numerical simulations highlight the merits of this approach, particularly when the changes are not only affecting the means of the rewards.

翻译：最近出现了对基于子抽样的非参数性土匪算法的兴趣激增。但是,这些方法的一个缺点是随机的子抽样和储存整个奖赏历史所需的额外复杂性。我们的第一个贡献是表明,在Baudry等人(2020年)最近的工作中,以“最后一块次抽样”的名义提出的一个简单的确定性亚抽样规则,在单参数指数家庭中是微不足道的。此外,我们证明,这些保证在将算法记忆限制在时界的多元函数时,也具有一定的优点。这些结论打开了新的视角,特别是对非静止情景而言,在非静止情景中,手臂分布会随着时间的变化而变化。我们提出了一个算法的变式,其中仅使用最近的意见进行次抽样,在已知数突然变化的假设下实现最佳的遗憾保证。广泛的数字模拟突出了这一方法的优点,特别是当这些变化不仅影响奖赏手段时。

0

相关内容

子采样

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

专知会员服务

40+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Algorithms for weighted independent transversals and strong colouring

Arxiv

0+阅读 · 2021年8月21日

Optimal Order Simple Regret for Gaussian Process Bandits

Optimal Order Simple Regret for Gaussian Process Bandits

Arxiv

0+阅读 · 2021年8月20日

Signal Detection in Degree Corrected ERGMs

Signal Detection in Degree Corrected ERGMs

Arxiv

0+阅读 · 2021年8月20日

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Arxiv

0+阅读 · 2021年8月19日

On variance estimation for the one-sample log-rank test

On variance estimation for the one-sample log-rank test

Arxiv

0+阅读 · 2021年8月18日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

专知会员服务

40+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Algorithms for weighted independent transversals and strong colouring

Arxiv

0+阅读 · 2021年8月21日

Optimal Order Simple Regret for Gaussian Process Bandits

Optimal Order Simple Regret for Gaussian Process Bandits

Arxiv

0+阅读 · 2021年8月20日

Signal Detection in Degree Corrected ERGMs

Signal Detection in Degree Corrected ERGMs

Arxiv

0+阅读 · 2021年8月20日

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Arxiv

0+阅读 · 2021年8月19日

On variance estimation for the one-sample log-rank test

On variance estimation for the one-sample log-rank test

Arxiv

0+阅读 · 2021年8月18日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员