在部分可部分观察情况下通过盾盾牌进行安全强化学习 (Safe Reinforcement Learning via Shielding under Partial Observability) - 专知论文

会员服务 ·

0

Learning · Agent · 回合 · state-of-the-art · Integration ·

2022 年 8 月 23 日

Safe Reinforcement Learning via Shielding under Partial Observability

翻译：在部分可部分观察情况下通过盾盾牌进行安全强化学习

Steven Carr,Nils Jansen,Sebastian Junges,Ufuk Topcu

from arxiv, 21 pages, 28 Figures, 3 Tables

Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches to this problem assume domain knowledge in the form of a (partial) model of this environment to decide upon the safety of an action. A so-called shield forces the RL agent to select only safe actions. However, for adoption in various applications, one must look beyond enforcing safety and also ensure the applicability of RL with good performance. We extend the applicability of shields via tight integration with state-of-the-art deep RL, and provide an extensive, empirical study in challenging, sparse-reward environments under partial observability. We show that a carefully integrated shield ensures safety and can improve the convergence rate and final performance of RL agents. We furthermore show that a shield can be used to bootstrap state-of-the-art RL agents: they remain safe after initial learning in a shielded setting, allowing us to disable a potentially too conservative shield eventually.

翻译：安全探索是强化学习(RL)的一个常见问题,目的是防止代理人在探索其环境时作出灾难性决定; 这一问题的各种办法都假定以这种环境的(部分)模式为形式的域知识来决定一项行动的安全性; 所谓的屏蔽迫使RL代理人只选择安全的行动; 但是,为了在各种应用中采用,人们必须超越安全性,并确保RL的可适用性; 我们通过与最先进的深层RL密切结合来扩大屏蔽的适用性,并在具有挑战性、稀疏、不易受部分防守的环境方面提供广泛的经验性研究; 我们表明,仔细整合的屏蔽能确保安全,并能提高RL代理人的趋同率和最终性能; 我们还表明,屏蔽能用于踩踏技术先进的RL代理人的状态:在初步学习屏蔽环境之后,它们仍然安全,从而最终能够禁用可能过于保守的屏蔽。

0

相关内容

Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

FTO抑制自噬导致非酒精性脂肪肝脂质沉积的关键分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于智能在线虚拟参考反馈整定的控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

牛磺酸对PUMA介导缺血再灌注心肌细胞凋亡的抑制作用

国家自然科学基金

0+阅读 · 2012年12月31日

PARP活性对人胚胎干细胞向视网膜色素上皮细胞定向分化的作用

国家自然科学基金

0+阅读 · 2012年12月31日

微尺度流动沸腾的不稳定性形成机理与非稳态液膜相变动力学

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Sonazoid肝脏超声造影诊断肝硬化的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

Hyperbolic Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月4日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Arxiv

0+阅读 · 2022年10月3日

B2RL: An open-source Dataset for Building Batch Reinforcement Learning

Arxiv

0+阅读 · 2022年9月30日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Arxiv

0+阅读 · 2022年9月30日

Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives

Arxiv

0+阅读 · 2022年9月30日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Deep Reinforcement Learning: An Overview

Deep Reinforcement Learning: An Overview

Arxiv

17+阅读 · 2018年11月26日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Hyperbolic Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月4日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Arxiv

0+阅读 · 2022年10月3日

B2RL: An open-source Dataset for Building Batch Reinforcement Learning

Arxiv

0+阅读 · 2022年9月30日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Arxiv

0+阅读 · 2022年9月30日

Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives

Arxiv

0+阅读 · 2022年9月30日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Deep Reinforcement Learning: An Overview

Deep Reinforcement Learning: An Overview

Arxiv

17+阅读 · 2018年11月26日

相关基金

FTO抑制自噬导致非酒精性脂肪肝脂质沉积的关键分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于智能在线虚拟参考反馈整定的控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

牛磺酸对PUMA介导缺血再灌注心肌细胞凋亡的抑制作用

国家自然科学基金

0+阅读 · 2012年12月31日

PARP活性对人胚胎干细胞向视网膜色素上皮细胞定向分化的作用

国家自然科学基金

0+阅读 · 2012年12月31日

微尺度流动沸腾的不稳定性形成机理与非稳态液膜相变动力学

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Sonazoid肝脏超声造影诊断肝硬化的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员