可行的促进作用-批评:为保证国家安全而限制的加强学习,以确保国家安全 (Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety) - 专知论文

会员服务 ·

0

FAC · 可行 · 互补松弛 · 拉格朗日函数 · 强化学习 ·

2021 年 5 月 28 日

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

翻译：可行的促进作用-批评:为保证国家安全而限制的加强学习,以确保国家安全

Haitong Ma,Yang Guan,Shegnbo Eben Li,Xiangteng Zhang,Sifa Zheng,Jianyu Chen

from arxiv, There are some confusions in Theorem 2 in section 4. We will resubmit it until this problem is fixed

The safety constraints commonly used by existing safe reinforcement learning (RL) methods are defined only on expectation of initial states, but allow each certain state to be unsafe, which is unsatisfying for real-world safety-critical tasks. In this paper, we introduce the feasible actor-critic (FAC) algorithm, which is the first model-free constrained RL method that considers statewise safety, e.g, safety for each initial state. We claim that some states are inherently unsafe no matter what policy we choose, while for other states there exist policies ensuring safety, where we say such states and policies are feasible. By constructing a statewise Lagrange function available on RL sampling and adopting an additional neural network to approximate the statewise Lagrange multiplier, we manage to obtain the optimal feasible policy which ensures safety for each feasible state and the safest possible policy for infeasible states. Furthermore, the trained multiplier net can indicate whether a given state is feasible or not through the statewise complementary slackness condition. We provide theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization. Experimental results on both robot locomotive tasks and safe exploration tasks verify the safety enhancement and feasibility interpretation of the proposed method.

翻译：现有安全强化学习(RL)方法通常使用的安全限制仅根据对初始国家的期望来确定,但允许每个特定国家处于不安全状态,这对现实世界的安全至关重要的任务是不满意的。在本文件中,我们引入了可行的行为者-批评算法(FAC)算法(FAC),这是第一种无模型限制的RL方法,认为是州性安全,例如每个初始国家的安全。我们声称,有些国家本质上是不安全的,不管我们选择什么政策,而另一些国家则有确保安全的政策,我们认为此类国家和政策是可行的。通过在RL取样方面建立一个州性拉格朗函数,并采用额外的神经网络,以接近州性拉格朗乘数,我们设法获得最佳可行的政策,确保每个可行国家的安全,并为不可行国家制定最安全的政策。此外,经过培训的乘数网可以表明某个州是否可行,不管我们选择什么政策,而其他国家则有确保安全,而我们说,在这些国家和政策是可行的情况下,有这样的政策。我们提供理论保证,在限制满意度和优化安全可行性提高试验方法方面,同时核查拟议的实验性试验方法。

0

相关内容

FAC

这本杂志的目的是发表理论和实践相结合的文章。目的是推广应用研究。因此，新的理论贡献是受欢迎的，他们的动机是潜在的应用；现有的形式主义的应用是有趣的，如果他们展示了一些新颖的方法或应用。官网链接：https://link.springer.com/journal/165

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

专知会员服务

65+阅读 · 2019年8月8日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

Arxiv

0+阅读 · 2021年7月19日

Safe Reinforcement Learning Using Advantage-Based Intervention

Safe Reinforcement Learning Using Advantage-Based Intervention

Arxiv

0+阅读 · 2021年7月19日

Provably Efficient Multi-Task Reinforcement Learning with Model Transfer

Arxiv

0+阅读 · 2021年7月19日

Bidding and Pricing in Budget and ROI Constrained Markets

Arxiv

0+阅读 · 2021年7月16日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

VIP会员

文章信息

相关主题

拉格朗日函数

相关VIP内容

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

【KDD 2019|Tutorial】应用在交通中的强化学习 Deep Reinforcement Learning with Applications in Transportation，滴滴 AI Labs

专知会员服务

65+阅读 · 2019年8月8日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

Arxiv

0+阅读 · 2021年7月19日

Safe Reinforcement Learning Using Advantage-Based Intervention

Safe Reinforcement Learning Using Advantage-Based Intervention

Arxiv

0+阅读 · 2021年7月19日

Provably Efficient Multi-Task Reinforcement Learning with Model Transfer

Arxiv

0+阅读 · 2021年7月19日

Bidding and Pricing in Budget and ROI Constrained Markets

Arxiv

0+阅读 · 2021年7月16日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

微信扫码咨询专知VIP会员