DOPE: 安全加强学习的乐观和悲观探索 (DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 约束 · 强化学习 · Markov · 转移概率 ·

2022 年 10 月 18 日

DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning

翻译：DOPE: 安全加强学习的乐观和悲观探索

Archana Bura,Aria HasanzadeZonuzy,Dileep Kalathil,Srinivas Shakkottai,Jean-Francois Chamberland

from arxiv, Accepted to NeurIPS 2022

Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations. We formulate this safe reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function, where we model the safety requirements as constraints on the expected cumulative costs that must be satisfied during all episodes of learning. We propose a model-based safe RL algorithm that we call Doubly Optimistic and Pessimistic Exploration (DOPE), and show that it achieves an objective regret $\tilde{O}(|\mathcal{S}|\sqrt{|\mathcal{A}| K})$ without violating the safety constraints during learning, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, and $K$ is the number of learning episodes. Our key idea is to combine a reward bonus for exploration (optimism) with a conservative constraint (pessimism), in addition to the standard optimistic model-based exploration. DOPE is not only able to improve the objective regret bound, but also shows a significant empirical performance improvement as compared to earlier optimism-pessimism approaches.

翻译：安全强化学习极具挑战性,不仅代理商必须探索未知的环境,而且必须这样做,同时确保不出现任何违反安全限制的情况。我们使用一个不为人知的过渡概率函数(CMDP)来制定安全强化学习(RL)问题,我们将安全要求作为限制所有学习阶段必须满足的预期累积成本的模型,我们建议一种基于模型的安全RL算法,我们称之为 Doubly 乐观和悲观性探索(DOPE),并表明它除了在学习期间不违反安全限制,在学习期间不违反安全限制的情况下,实现了一个安全强化学习(RL)问题。在学习过程中,我们把安全要求作为限制作为限制,在所有学习阶段都必须满足的预期累积成本。我们提出一个基于模型的安全RL算法,我们称之为 Doubly 乐观和悲观性探索(DOPE),我们的主要想法是将一个奖励奖金与保守的制约(philismismismismismismismism)结合起来,除了标准的乐观性乐观性实验方法之外,也只是初步改进了一种重大的乐观性实验性做法。

0

相关内容

Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

脑胶质瘤中Hedgehog通路介导的长链非编码RNA-MEG3作用机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Orexin/OX1R激动FOXO1/Atg7干预胰岛β细胞自噬的机制及其在胰岛功能缺陷中的意义

国家自然科学基金

0+阅读 · 2014年12月31日

新疆药用肉苁蓉及其寄主种质资源评价研究

国家自然科学基金

0+阅读 · 2014年12月31日

时变环境下HIV病毒动力学模型的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于ISSR分析、主成分分析的新疆软紫草属植物资源调查及其质量评价

国家自然科学基金

0+阅读 · 2012年12月31日

裂缝型稠油油藏非等温渗吸机理及动力学模型

国家自然科学基金

0+阅读 · 2012年12月31日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

NiMnInCo磁热合金的绝热温变研究

国家自然科学基金

0+阅读 · 2012年12月31日

La(2-x)GdxHf2O7:RE新型闪烁透明陶瓷的晶体结构和闪烁性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints

Arxiv

0+阅读 · 2022年11月29日

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

Arxiv

0+阅读 · 2022年11月28日

Hypernetworks for Zero-shot Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2022年11月28日

State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月28日

Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability

Arxiv

0+阅读 · 2022年11月28日

Autonomous Racing using a Hybrid Imitation-Reinforcement Learning Architecture

Arxiv

0+阅读 · 2022年11月26日

Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Arxiv

0+阅读 · 2022年11月24日

Regret Bounds for Information-Directed Reinforcement Learning

Arxiv

0+阅读 · 2022年11月24日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年11月23日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】《学习生成三维内容：几何、外观与物理》

战术边缘指挥控制：防务面临的核心挑战

【ICML2025】基于柔性条件的蛋白质结构设计与流匹配

基于大语言模型（LLM）的智能体推理框架：从方法到场景的综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints

Arxiv

0+阅读 · 2022年11月29日

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

Arxiv

0+阅读 · 2022年11月28日

Hypernetworks for Zero-shot Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2022年11月28日

State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月28日

Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability

Arxiv

0+阅读 · 2022年11月28日

Autonomous Racing using a Hybrid Imitation-Reinforcement Learning Architecture

Arxiv

0+阅读 · 2022年11月26日

Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Arxiv

0+阅读 · 2022年11月24日

Regret Bounds for Information-Directed Reinforcement Learning

Arxiv

0+阅读 · 2022年11月24日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年11月23日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

脑胶质瘤中Hedgehog通路介导的长链非编码RNA-MEG3作用机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Orexin/OX1R激动FOXO1/Atg7干预胰岛β细胞自噬的机制及其在胰岛功能缺陷中的意义

国家自然科学基金

0+阅读 · 2014年12月31日

新疆药用肉苁蓉及其寄主种质资源评价研究

国家自然科学基金

0+阅读 · 2014年12月31日

时变环境下HIV病毒动力学模型的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于ISSR分析、主成分分析的新疆软紫草属植物资源调查及其质量评价

国家自然科学基金

0+阅读 · 2012年12月31日

裂缝型稠油油藏非等温渗吸机理及动力学模型

国家自然科学基金

0+阅读 · 2012年12月31日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

NiMnInCo磁热合金的绝热温变研究

国家自然科学基金

0+阅读 · 2012年12月31日

La(2-x)GdxHf2O7:RE新型闪烁透明陶瓷的晶体结构和闪烁性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员