在受约束的政策优化中有效利用额外安全预算进行高效探索</s> (Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization) - 专知论文

会员服务 ·

0

优化器 · 约束 · 代价 · 控制器 · Performer ·

2023 年 2 月 28 日

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

翻译：在受约束的政策优化中有效利用额外安全预算进行高效探索

Haotian Xu,Shengjie Wang,Zhaolei Wang,Qing Zhuo,Tao Zhang

from arxiv, 7 pages, 8 figures

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose a Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) algorithm to strike a balance between the exploration and the constraints. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety budget) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with CPO algorithm.

翻译：强化学习(RL)在多数机器人控制任务中取得了可喜的成果。学习控制器的安全是确保控制器有效性的基本概念。目前的方法在培训期间采用整体一致性限制,从而导致早期探索效率低下。在本文件中,我们提议采用限制政策优化与额外安全预算(ESB-CPO)的算法,以在勘探和制约之间取得平衡。在早期,我们的方法通过我们提出的新指标帮助,放松了不安全过渡(增加额外安全预算)的实际限制。随着培训过程,我们优化问题中的制约因素变得更为严格。与此同时,理论分析和实际实验表明,我们的方法在最后培训阶段逐渐满足了成本限制的需求。在对安全-吉姆和子弹-安全-吉姆基准进行评估时,我们的方法在安全和最佳性的基准算法方面表现出了优势。值得注意的是,我们的方法在与CPO算法相同的成本限制下取得了显著的业绩改进。</s>

0

相关内容

优化器

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

髓源性抑制细胞对乳腺癌细胞“干性”获得的生物学和力学特性调控及其分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

新型可激活超分子光敏剂的设计制备与构效关系

国家自然科学基金

0+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

细胞周期蛋白Cyclin G1与肿瘤分子靶向治疗诱导多倍体耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

Fe、Co、Ni超细纳米结构制备与催化放氢研究

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

N2-H2O和O2-H2O复合物的中红外激光光谱研究

国家自然科学基金

0+阅读 · 2011年12月31日

抑癌基因ARHI与孤儿受体TR3相互结合及在胃癌发生发展中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

髓源性抑制细胞对乳腺癌细胞“干性”获得的生物学和力学特性调控及其分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

新型可激活超分子光敏剂的设计制备与构效关系

国家自然科学基金

0+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

细胞周期蛋白Cyclin G1与肿瘤分子靶向治疗诱导多倍体耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

Fe、Co、Ni超细纳米结构制备与催化放氢研究

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

N2-H2O和O2-H2O复合物的中红外激光光谱研究

国家自然科学基金

0+阅读 · 2011年12月31日

抑癌基因ARHI与孤儿受体TR3相互结合及在胃癌发生发展中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员