保守探索统一框架 (A Unified Framework for Conservative Exploration) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Bandits · 线性的 · 情景 · Performer ·

2021 年 6 月 22 日

A Unified Framework for Conservative Exploration

翻译：保守探索统一框架

Yunchang Yang,Tianhao Wu,Han Zhong,Evrard Garcelon,Matteo Pirotta,Alessandro Lazaric,Liwei Wang,Simon S. Du

We study bandits and reinforcement learning (RL) subject to a conservative constraint where the agent is asked to perform at least as well as a given baseline policy. This setting is particular relevant in real-world domains including digital marketing, healthcare, production, finance, etc. For multi-armed bandits, linear bandits and tabular RL, specialized algorithms and theoretical analyses were proposed in previous work. In this paper, we present a unified framework for conservative bandits and RL, in which our core technique is to calculate the necessary and sufficient budget obtained from running the baseline policy. For lower bounds, our framework gives a black-box reduction that turns a certain lower bound in the nonconservative setting into a new lower bound in the conservative setting. We strengthen the existing lower bound for conservative multi-armed bandits and obtain new lower bounds for conservative linear bandits, tabular RL and low-rank MDP. For upper bounds, our framework turns a certain nonconservative upper-confidence-bound (UCB) algorithm into a conservative algorithm with a simple analysis. For multi-armed bandits, linear bandits and tabular RL, our new upper bounds tighten or match existing ones with significantly simpler analyses. We also obtain a new upper bound for conservative low-rank MDP.

翻译：我们研究强盗和强化学习(RL),但受保守限制,即要求代理人至少执行某项基线政策,这种环境在现实世界领域,包括数字营销、保健、生产、金融等领域特别相关。对于多武装强盗、线性强盗和表格RL,在以前的工作中提出了专门的算法和理论分析。在本文中,我们提出了一个保守强盗和RL的统一框架,我们的核心技术是计算从执行基线政策中获得的必要和足够的预算。对于下限,我们的框架提供了黑箱削减,将非保守环境中的某种较低约束转化为新的较低约束,在保守环境中,我们加强了现有的保守多武装强盗的较低约束,并为保守的线性强盗获得了新的较低界限,表格RL和低级MDP。对于上限,我们的框架将某种非保守性的上限(UCB)算法转换为一种带有简单分析的保守性算法。对于多武装强盗、线性强盗和表式RL,我们新的上限线性强盗或匹配现有的低级MRDP。我们还获得了一个新的新的新框,用于大幅度的上层保守分析。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

专知会员服务

74+阅读 · 2019年11月20日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

深度学习与NLP

45+阅读 · 2019年10月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Exclusive Group Lasso for Structured Variable Selection

Arxiv

0+阅读 · 2021年8月23日

A Novel Framework for Online Supervised Learning with Feature Selection

Arxiv

0+阅读 · 2021年8月21日

Conservative Objective Models for Effective Offline Model-Based Optimization

Arxiv

4+阅读 · 2021年7月14日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Constrained-CNN losses forweakly supervised segmentation

Arxiv

5+阅读 · 2018年5月12日

Compassionately Conservative Balanced Cuts for Image Segmentation

Arxiv

5+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

专知会员服务

74+阅读 · 2019年11月20日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向未来：军事应用中基于人工智能融合的场景分析及其对全球安全的影响》

《美陆军特种作战条令》最新102页

《美军条令：斯特赖克步兵步枪排与班作战条令》最新450页

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

相关资讯

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

深度学习与NLP

45+阅读 · 2019年10月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Exclusive Group Lasso for Structured Variable Selection

Arxiv

0+阅读 · 2021年8月23日

A Novel Framework for Online Supervised Learning with Feature Selection

Arxiv

0+阅读 · 2021年8月21日

Conservative Objective Models for Effective Offline Model-Based Optimization

Arxiv

4+阅读 · 2021年7月14日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Constrained-CNN losses forweakly supervised segmentation

Arxiv

5+阅读 · 2018年5月12日

Compassionately Conservative Balanced Cuts for Image Segmentation

Arxiv

5+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员