对安全强化学习进行限制的多变政策优化 (Constrained Variational Policy Optimization for Safe Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 优化器 · 变分分布 · 约束 · Better ·

2022 年 6 月 15 日

Constrained Variational Policy Optimization for Safe Reinforcement Learning

翻译：对安全强化学习进行限制的多变政策优化

Zuxin Liu,Zhepeng Cen,Vladislav Isenbaev,Wei Liu,Zhiwei Steven Wu,Bo Li,Ding Zhao

from arxiv, ICML 2022. 25 pages

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality guarantees. This paper overcomes the issues from the perspective of probabilistic inference. We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational distribution could be computed in closed form after a convex optimization (E-step); 2) the policy parameter is improved within the trust region based on the optimal variational distribution (M-step). The proposed algorithm decomposes the safe RL problem into a convex optimization phase and a supervised learning phase, which yields a more stable training performance. A wide range of experiments on continuous robotic tasks shows that the proposed method achieves significantly better constraint satisfaction performance and better sample efficiency than baselines. The code is available at https://github.com/liuzuxin/cvpo-safe-rl.

翻译：安全强化学习(RL)旨在学习满足某些限制的政策,然后将其部署到安全关键应用中。以前的初等风格方法存在不稳定问题,缺乏最佳的保障。本文从概率推论的角度克服了问题。我们引入了新的期望-最大化方法,自然地将政策学习过程中的制约因素纳入其中:1) 在调控优化后,可以以封闭的形式计算出一个可证实的最佳非参数差异分布;2) 政策参数在信任区域内根据最佳的变异分布(M-step)得到改进。提议的算法将安全RL问题分解成一个凝固优化阶段和一个有监督的学习阶段,从而产生更稳定的培训绩效。关于连续机器人任务的广泛实验表明,拟议的方法比基线大大改进了满意度和样本效率。该代码可在https://github.com/liuzuxin/cvpo-safe-rl查阅。

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

肌肉因子白介素15在运动改善胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Lee偏差在试验设计中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

甲基阿魏酸抑制Nox4/ROS-p38MAPK通路抗肝纤维化作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型LED蓝宝石衬底抛光液设计及其抛光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

亚微米及纳米颗粒两相湍流的研究

国家自然科学基金

0+阅读 · 2011年12月31日

胞间信号介导的辐射近旁干涉效应及其机理

国家自然科学基金

0+阅读 · 2009年12月31日

LED激发光源叶绿素荧光参数在线检测方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies

Arxiv

0+阅读 · 2022年8月2日

Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

Model-based graph reinforcement learning for inductive traffic signal control

Arxiv

0+阅读 · 2022年8月1日

A Reinforcement Learning Approach to Sensing Design in Resource-Constrained Wireless Networked Control Systems

Arxiv

0+阅读 · 2022年7月30日

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

Arxiv

0+阅读 · 2022年7月29日

Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions

Arxiv

0+阅读 · 2022年7月29日

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月28日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

《分层多智能体系统分类：设计范式、协调机制与工业应用》最新28页

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies

Arxiv

0+阅读 · 2022年8月2日

Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

Model-based graph reinforcement learning for inductive traffic signal control

Arxiv

0+阅读 · 2022年8月1日

A Reinforcement Learning Approach to Sensing Design in Resource-Constrained Wireless Networked Control Systems

Arxiv

0+阅读 · 2022年7月30日

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

Arxiv

0+阅读 · 2022年7月29日

Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions

Arxiv

0+阅读 · 2022年7月29日

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月28日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

肌肉因子白介素15在运动改善胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Lee偏差在试验设计中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

甲基阿魏酸抑制Nox4/ROS-p38MAPK通路抗肝纤维化作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型LED蓝宝石衬底抛光液设计及其抛光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

亚微米及纳米颗粒两相湍流的研究

国家自然科学基金

0+阅读 · 2011年12月31日

胞间信号介导的辐射近旁干涉效应及其机理

国家自然科学基金

0+阅读 · 2009年12月31日

LED激发光源叶绿素荧光参数在线检测方法的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员