存储多武装多武装强盗的挥动算法 (Streaming Algorithms for Stochastic Multi-armed Bandits) - 专知论文

会员服务 ·

0

流 · 赌博机/老虎机 · ARM · 样本复杂度 · 样本 ·

2020 年 12 月 9 日

Streaming Algorithms for Stochastic Multi-armed Bandits

翻译：存储多武装多武装强盗的挥动算法

Arnab Maiti,Vishakha Patil,Arindam Khan

from arxiv, 24 pages, 2 figures, 4 algorithms

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show {\Omega}(T^{2/3}) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an {\epsilon}-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on the observed outcomes in the earlier rounds. The best-arm is the output at the end of r rounds. The upper bound on the sample complexity of our algorithm matches with the lower bound for any r-round adaptive streaming algorithm. Secondly, we present a heuristic to find the {\epsilon}-best arm with optimal sample complexity, by storing only one extra arm in the memory.

翻译：我们用捆绑的手臂来研究多武装盗匪问题。在这个环境中, 武器进入一个流体, 以及随时可以存储在记忆中的武器数量, 被捆绑。决策者只能拉出记忆中存在的武器。我们从两个标准目标的角度来解决这个问题:(1) 尽量减少遗憾, 和(2) 最佳武器识别。为了最小化, 我们通过展示近乎紧紧的硬性来解决一个重要的开放问题。我们表现出对武器( n-1, 其中武器数量为n-1) 的期待累积的遗憾。为了最佳武器识别, 我们研究两种算法。首先, 我们提出一个O(r) 手臂- 模拟的调整性回流算法, 以找到一个最优武器识别的快速适应性流算法。每一轮的手臂拉动法都是根据前几轮中观察到的结果决定的。最佳武器( n-1, 其中n-1, 即武器的数量。为了最佳武器识别, 我们研究两种算法。首先, 我们提出一个最高级的调整性回流。

0

相关内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

155+阅读 · 2020年8月7日

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

专知会员服务

53+阅读 · 2020年2月15日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

已删除

将门创投

5+阅读 · 2019年4月15日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Manipulability optimization for multi-arm teleoperation

Arxiv

0+阅读 · 2021年2月10日

Player Modeling via Multi-Armed Bandits

Arxiv

0+阅读 · 2021年2月10日

Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits

Arxiv

0+阅读 · 2021年2月10日

Nonstochastic Bandits with Infinitely Many Experts

Arxiv

0+阅读 · 2021年2月9日

Robust Bandit Learning with Imperfect Context

Arxiv

0+阅读 · 2021年2月9日

RL for Latent MDPs: Regret Guarantees and a Lower Bound

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Arxiv

0+阅读 · 2021年2月9日

Berry--Esseen Bounds for Multivariate Nonlinear Statistics with Applications to M-estimators and Stochastic Gradient Descent Algorithms

Arxiv

0+阅读 · 2021年2月9日

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

Arxiv

0+阅读 · 2021年2月9日

An Efficient Algorithm for Cooperative Semi-Bandits

Arxiv

0+阅读 · 2021年2月9日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

VIP会员

文章信息

相关主题

赌博机/老虎机

样本复杂度

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

155+阅读 · 2020年8月7日

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

专知会员服务

53+阅读 · 2020年2月15日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

DeepSeek：智能时代的全面到来和人机协作的新常态，71页ppt

【CVPR2025】STAA-SNN：用于脉冲神经网络的时空注意力聚合器

DeepSeek部署、使用及安全深度报告（附PPT下载）

2024图灵奖颁给了强化学习之父Richard Sutton与导师Andrew Barto

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

已删除

将门创投

5+阅读 · 2019年4月15日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Manipulability optimization for multi-arm teleoperation

Arxiv

0+阅读 · 2021年2月10日

Player Modeling via Multi-Armed Bandits

Arxiv

0+阅读 · 2021年2月10日

Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits

Arxiv

0+阅读 · 2021年2月10日

Nonstochastic Bandits with Infinitely Many Experts

Arxiv

0+阅读 · 2021年2月9日

Robust Bandit Learning with Imperfect Context

Arxiv

0+阅读 · 2021年2月9日

RL for Latent MDPs: Regret Guarantees and a Lower Bound

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Arxiv

0+阅读 · 2021年2月9日

Berry--Esseen Bounds for Multivariate Nonlinear Statistics with Applications to M-estimators and Stochastic Gradient Descent Algorithms

Arxiv

0+阅读 · 2021年2月9日

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

Arxiv

0+阅读 · 2021年2月9日

An Efficient Algorithm for Cooperative Semi-Bandits

Arxiv

0+阅读 · 2021年2月9日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

微信扫码咨询专知VIP会员