限制更新最新安全政策优化预测方法 (Constrained Update Projection Approach to Safe Policy Optimization) - 专知论文

会员服务 ·

0

Projection · 替代函数 · 优化器 · Performer · Agent ·

2022 年 9 月 15 日

Constrained Update Projection Approach to Safe Policy Optimization

翻译：限制更新最新安全政策优化预测方法

Long Yang,Jiaming Ji,Juntao Dai,Linrui Zhang,Binbin Zhou,Pengfei Li,Yaodong Yang,Gang Pan

from arxiv, Accepted by NeurIPS2022. arXiv admin note: substantial text overlap with arXiv:2202.07565; text overlap with arXiv:2002.06506 by other authors

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and interpretability for some existing algorithms; 3) CUP provides a non-convex implementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at https://github.com/RL-boxes/Safe-RL/tree/ main/CUP.

翻译：安全强化学习(RL)研究智能剂不仅必须最大限度地获得奖励,而且还必须避免探索不安全地区的问题。在这项研究中,我们建议CUP,这是基于严格安全保障的受控更新预测框架的新的政策优化方法;我们的CUP开发中心是新提议的代用功能,与性能约束一起。与以前的安全RL方法相比,CUP享受的好处是:(1) CUP将代用功能概括化为普遍优势估计仪(GAE),导致强有力的实证性表现。(2) CUP统一了性能界限,为某些现有算法提供了更好的理解和解释;(3) CUP仅通过一级优化器提供非convex执行,这不需要对目标的共性作任何强烈的近似。为了验证我们的CUP方法,我们将CUP与一系列任务的安全RL基线综合清单进行比较。实验显示CUP在奖励和安全约束性满意性两方面的有效性。我们在https://github.com/RUP/main/Safefle/Safe)打开CUP源码。

0

相关内容

Projection

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

氧化石墨烯对植物病原真菌的杀菌机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

重载车辆ECAS/CTIS集成系统耦合机理及主动控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

复合电源与双三相电机集成系统控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

PEMFC-SC混合发电系统的无源非线性控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

机械式自动变速器的滚动优化控制

国家自然科学基金

0+阅读 · 2012年12月31日

物联制造系统的主动调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

微网系统中储能装置的互补优化控制策略研究

国家自然科学基金

0+阅读 · 2011年12月31日

微分对策数值解法及非线性系统Min-Max鲁棒后退时域控制算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Product Ranking for Revenue Maximization with Multiple Purchases

Arxiv

0+阅读 · 2022年10月25日

UNIFY: a Unified Policy Designing Framework for Solving Constrained Optimization Problems with Machine Learning

Arxiv

0+阅读 · 2022年10月25日

Introducing causal inference in the energy-efficient building design process

Arxiv

0+阅读 · 2022年10月24日

Learning constitutive models from microstructural simulations via a non-intrusive reduced basis method: Extension to geometrical parameterizations

Arxiv

0+阅读 · 2022年10月24日

An Annotation-based Approach for Finding Bugs in Neural Network Programs

Arxiv

0+阅读 · 2022年10月24日

Decentralized Stochastic Bilevel Optimization with Improved Per-Iteration Complexity

Arxiv

0+阅读 · 2022年10月23日

Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems

Arxiv

0+阅读 · 2022年10月22日

Algorithms with Prediction Portfolios

Arxiv

0+阅读 · 2022年10月22日

A stochastic first-order trust-region method with inexact restoration for finite-sum minimization

Arxiv

0+阅读 · 2022年10月22日

An Improved Algorithm for Clustered Federated Learning

Arxiv

0+阅读 · 2022年10月20日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Product Ranking for Revenue Maximization with Multiple Purchases

Arxiv

0+阅读 · 2022年10月25日

UNIFY: a Unified Policy Designing Framework for Solving Constrained Optimization Problems with Machine Learning

Arxiv

0+阅读 · 2022年10月25日

Introducing causal inference in the energy-efficient building design process

Arxiv

0+阅读 · 2022年10月24日

Learning constitutive models from microstructural simulations via a non-intrusive reduced basis method: Extension to geometrical parameterizations

Arxiv

0+阅读 · 2022年10月24日

An Annotation-based Approach for Finding Bugs in Neural Network Programs

Arxiv

0+阅读 · 2022年10月24日

Decentralized Stochastic Bilevel Optimization with Improved Per-Iteration Complexity

Arxiv

0+阅读 · 2022年10月23日

Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems

Arxiv

0+阅读 · 2022年10月22日

Algorithms with Prediction Portfolios

Arxiv

0+阅读 · 2022年10月22日

A stochastic first-order trust-region method with inexact restoration for finite-sum minimization

Arxiv

0+阅读 · 2022年10月22日

An Improved Algorithm for Clustered Federated Learning

Arxiv

0+阅读 · 2022年10月20日

相关基金

氧化石墨烯对植物病原真菌的杀菌机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

重载车辆ECAS/CTIS集成系统耦合机理及主动控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

复合电源与双三相电机集成系统控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

PEMFC-SC混合发电系统的无源非线性控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

机械式自动变速器的滚动优化控制

国家自然科学基金

0+阅读 · 2012年12月31日

物联制造系统的主动调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

微网系统中储能装置的互补优化控制策略研究

国家自然科学基金

0+阅读 · 2011年12月31日

微分对策数值解法及非线性系统Min-Max鲁棒后退时域控制算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员