受记忆约束的政策优化 (Memory-Constrained Policy Optimization) - 专知论文

会员服务 ·

0

优化器 · Performer · 约束优化 · Atari · Buffer（公司） ·

2022 年 4 月 20 日

Memory-Constrained Policy Optimization

翻译：受记忆约束的政策优化

Hung Le,Thommen Karimpanal George,Majid Abdolshah,Dung Nguyen,Kien Do,Sunil Gupta,Svetha Venkatesh

from arxiv, Preprint, 24 pages

We introduce a new constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the proximity of one single old policy as the first trust region as done by prior works, we propose to form a second trust region through the construction of another virtual policy that represents a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial in case the old policy performs badly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory buffer of past policies, providing a new capability for dynamically selecting appropriate trust regions during the optimization process. Our proposed method, dubbed as Memory-Constrained Policy Optimization (MCPO), is examined on a diverse suite of environments including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.

翻译：我们为政策梯度强化学习采用一种新的限制优化方法,利用两个信任区域来监管每项政策更新。除了像以往工作那样将一个单一的旧政策作为第一个信任区域,我们提议通过建立代表一系列过去政策的另一种虚拟政策来形成第二个信任区域。然后我们强制执行新政策,以便更贴近虚拟政策,这在旧政策表现不佳时是有益的。更重要的是,我们提议了一个机制,从过去政策的记忆缓冲中自动建立虚拟政策,为在优化过程中积极选择适当的信任区域提供新的能力。我们提出的方法被称为 " 记忆约束政策优化(MCPO) " (MCPO),是在多种环境中加以审查的,包括机器人移动控制、带微量奖的导航和Atari游戏,不断展示与最近受政策制约的政策梯度方法的竞争表现。

0

相关内容

优化器

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

专知会员服务

65+阅读 · 2019年8月8日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

高选择性过渡金属碳化物/氧化物催化加氢机理的研究及催化剂的设计与制备

国家自然科学基金

0+阅读 · 2014年12月31日

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

新型中高能轻粒子二维位置灵敏探测器研制

国家自然科学基金

0+阅读 · 2012年12月31日

有机胺为模板的金属(II)硫酸盐铁电材料的合成及性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

加氢脱硫、脱芳烃沸石负载金属催化剂制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型杂环酰胺砜类化合物的合成、抑菌活性及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于过渡金属氧化物纳米片的半导体材料组装及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

以鱼腥草素杂合黄酮为先导物的新型抗HSV药物的设计与合成

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

ROI Constrained Bidding via Curriculum-Guided Bayesian Reinforcement Learning

Arxiv

0+阅读 · 2022年6月10日

Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

Arxiv

0+阅读 · 2022年6月9日

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Arxiv

0+阅读 · 2022年6月9日

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月9日

Cone-Restricted Information Theory

Cone-Restricted Information Theory

Arxiv

0+阅读 · 2022年6月9日

Model Generation with Provable Coverability for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月8日

PrivacyDates: A Framework for More Privacy-Preserving Timestamp Data Types

Arxiv

0+阅读 · 2022年6月7日

Reachability Constrained Reinforcement Learning

Arxiv

0+阅读 · 2022年6月7日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

Buffer（公司）

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

专知会员服务

65+阅读 · 2019年8月8日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

ROI Constrained Bidding via Curriculum-Guided Bayesian Reinforcement Learning

Arxiv

0+阅读 · 2022年6月10日

Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

Arxiv

0+阅读 · 2022年6月9日

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Arxiv

0+阅读 · 2022年6月9日

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月9日

Cone-Restricted Information Theory

Cone-Restricted Information Theory

Arxiv

0+阅读 · 2022年6月9日

Model Generation with Provable Coverability for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月8日

PrivacyDates: A Framework for More Privacy-Preserving Timestamp Data Types

Arxiv

0+阅读 · 2022年6月7日

Reachability Constrained Reinforcement Learning

Arxiv

0+阅读 · 2022年6月7日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

高选择性过渡金属碳化物/氧化物催化加氢机理的研究及催化剂的设计与制备

国家自然科学基金

0+阅读 · 2014年12月31日

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

新型中高能轻粒子二维位置灵敏探测器研制

国家自然科学基金

0+阅读 · 2012年12月31日

有机胺为模板的金属(II)硫酸盐铁电材料的合成及性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

加氢脱硫、脱芳烃沸石负载金属催化剂制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型杂环酰胺砜类化合物的合成、抑菌活性及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于过渡金属氧化物纳米片的半导体材料组装及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

以鱼腥草素杂合黄酮为先导物的新型抗HSV药物的设计与合成

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员