回收回报下的离线规划和在线学习 (Offline Planning and Online Learning under Recovering Rewards) - 专知论文

会员服务 ·

0

周期的 · 上置信界限 · ARM · 学成 · 在线 ·

2021 年 12 月 21 日

Offline Planning and Online Learning under Recovering Rewards

翻译：回收回报下的离线规划和在线学习

David Simchi-Levi,Zeyu Zheng,Feng Zhu

from arxiv, v1 accepted by ICML 2021

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases. With the objective of maximizing the expected cumulative reward over $T$ time periods, we design a class of ``Purely Periodic Policies'' that jointly set a period to pull each arm. For the proposed policies, we prove performance guarantees for both the offline problem and the online problems. For the offline problem when all model parameters are known, the proposed periodic policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be dynamically learned, we integrate the offline periodic policy with the upper confidence bound procedure to construct on online policy. The proposed online policy is proved to approximately have $\widetilde{\mathcal O}(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may shed light on broader offline planning and online learning applications with non-stationary and recovering rewards.

翻译：受在线电子商务、促销和建议等新兴应用的推动,我们推出并解决了非固定性多武装土匪问题的一般类别,其目标有以下两个特点:(一) 决策者可以在每个时期从不同武器中从最高到美元(Ge 1美元)调出并收取收益;(二) 手臂在拉动后预期会立即下降,然后随着手臂闲置时间的增加而非对称恢复。为了最大限度地增加预期在美元时段的累计报酬,我们设计了“双周期政策”类别,共同设定了拉动每个手臂的时间。对于拟议的政策,我们证明对离线问题和在线问题都有了业绩保障。对于所有模型参数已知的离线问题,拟议的定期政策获得了近似比率,这在1美元时将O(1/qrt{K}闲置时间增加。当美元时,我们将在线应用的预期累积性最优化到无法确定的时间段值。对于在线政策来说,在不固定的模型和在线政策上,我们需要在不固定的模型上构建一个不固定的模型,当我们逐渐构建政策时,在网上学习的时候,我们需要将在线政策升级的升级的模型到不断构建。

0

相关内容

周期的

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

基于身心共融运动训练的肢体康复机器人多模态反馈方法研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

非对称随机波动建模及其在金融风险管理中的应用研究

国家自然科学基金

4+阅读 · 2014年12月31日

面向不平衡样本的流形学习故障诊断方法

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下的可信服务组合及运行保障研究

国家自然科学基金

0+阅读 · 2012年12月31日

物联网服务资源管理与调度技术的研究

国家自然科学基金

3+阅读 · 2012年12月31日

癌症相关受体EGFR、Fas、ER和AR与钙调素相互作用的晶体结构研究

国家自然科学基金

1+阅读 · 2009年12月31日

高分辨率SAR复杂场景建模与基于场景指引的目标检测

国家自然科学基金

1+阅读 · 2009年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

服务管理中承诺服务完成时间和服务按时完成率的决策

国家自然科学基金

0+阅读 · 2008年12月31日

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Online Caching with no Regret: Optimistic Learning via Recommendations

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

Reinforcement Learning with Intrinsic Affinity for Personalized Asset Management

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

Active Learning with Weak Labels for Gaussian Processes

Arxiv

2+阅读 · 2022年4月18日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

VIP会员

文章信息

相关主题

上置信界限

相关VIP内容

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Online Caching with no Regret: Optimistic Learning via Recommendations

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

Reinforcement Learning with Intrinsic Affinity for Personalized Asset Management

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

Active Learning with Weak Labels for Gaussian Processes

Arxiv

2+阅读 · 2022年4月18日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

相关基金

基于身心共融运动训练的肢体康复机器人多模态反馈方法研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

非对称随机波动建模及其在金融风险管理中的应用研究

国家自然科学基金

4+阅读 · 2014年12月31日

面向不平衡样本的流形学习故障诊断方法

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下的可信服务组合及运行保障研究

国家自然科学基金

0+阅读 · 2012年12月31日

物联网服务资源管理与调度技术的研究

国家自然科学基金

3+阅读 · 2012年12月31日

癌症相关受体EGFR、Fas、ER和AR与钙调素相互作用的晶体结构研究

国家自然科学基金

1+阅读 · 2009年12月31日

高分辨率SAR复杂场景建模与基于场景指引的目标检测

国家自然科学基金

1+阅读 · 2009年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

服务管理中承诺服务完成时间和服务按时完成率的决策

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员