回收回报下动态规划和学习 (Dynamic Planning and Learning under Recovering Rewards) - 专知论文

会员服务 ·

0

上置信界限 · 学成 · 近似 · 赌博机/老虎机 · Performer ·

2021 年 6 月 28 日

Dynamic Planning and Learning under Recovering Rewards

翻译：回收回报下动态规划和学习

David Simchi-Levi,Zeyu Zheng,Feng Zhu

from arxiv, Accepted by ICML 2021

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from at most $K$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non parametrically recovers as the idle time increases. With the objective of maximizing expected cumulative rewards over $T$ time periods, we propose, construct and prove performance guarantees for a class of "Purely Periodic Policies". For the offline problem when all model parameters are known, our proposed policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has $\widetilde{\mathcal O}(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards.

翻译：在诸如“实况流”电子商务、促销和建议等新兴应用的推动下,我们引入了一个具有以下两个特点的多武装匪徒问题的一般类别:(一) 决策者可以在每个时期从不同的武器中拿出最多不超过1美元的款项,并从中收取报酬;(二) 手臂在被拉走后预期的奖励立即下降,然后随着闲置时间的增加而不光彩地恢复。为了尽可能增加预期的累积收益,在T美元的时间段内,我们提议、建立并证明“定期政策”一类的绩效保障。对于所有示范参数已知的离线问题,我们拟议的政策获得的近似比率为1美元左右的O(1/\sqrt{K}美元;当美元逐渐增长到无限的时候,这一预期的奖励就会立即下降,然后随着闲置时间的增加。对于当模型参数未知而需要了解时的在线问题,我们设计了一个基于“高度信任(UCB)”的政策,其基础大约已经比Uplitele Omacal Oral ad adline 和“Orestal ” (N\\\\\\\\\\\\ recreal ress real des real des des des des des des des) 而不是我们的非学习框架,可能调整了我们的非学习框架。

0

相关内容

上置信界限

上置信界限

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

专知会员服务

27+阅读 · 2019年12月19日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

专知

8+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Robust Motion Planning in the Presence of Estimation Uncertainty

Arxiv

0+阅读 · 2021年8月26日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

Dynamic Transfer Learning for Named Entity Recognition

Dynamic Transfer Learning for Named Entity Recognition

Arxiv

3+阅读 · 2018年12月13日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Arxiv

6+阅读 · 2018年4月20日

Cache-Enabled Dynamic Rate Allocation via Deep Self-Transfer Reinforcement Learning

Arxiv

4+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

上置信界限

赌博机/老虎机

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

专知会员服务

27+阅读 · 2019年12月19日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

专知

8+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Robust Motion Planning in the Presence of Estimation Uncertainty

Arxiv

0+阅读 · 2021年8月26日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

Dynamic Transfer Learning for Named Entity Recognition

Dynamic Transfer Learning for Named Entity Recognition

Arxiv

3+阅读 · 2018年12月13日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Arxiv

6+阅读 · 2018年4月20日

Cache-Enabled Dynamic Rate Allocation via Deep Self-Transfer Reinforcement Learning

Arxiv

4+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员