以模型为基础的行为者对存储系统机会限制的批评 (Model-Based Actor-Critic with Chance Constraint for Stochastic System) - 专知论文

会员服务 ·

0

优化器 · 约束 · 学成 · Performer · MoDELS ·

2021 年 3 月 16 日

Model-Based Actor-Critic with Chance Constraint for Stochastic System

翻译：以模型为基础的行为者对存储系统机会限制的批评

Baiyu Peng,Yao Mu,Yang Guan,Shengbo Eben Li,Yuming Yin,Jianyu Chen

Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low convergence rate, or only learn a conservative policy. In this paper, we propose a model-based chance constrained actor-critic (CCAC) algorithm which can efficiently learn a safe and non-conservative policy. Different from existing methods that optimize a conservative lower bound, CCAC directly solves the original chance constrained problems, where the objective function and safe probability is simultaneously optimized with adaptive weights. In order to improve the convergence rate, CCAC utilizes the gradient of dynamic model to accelerate policy optimization. The effectiveness of CCAC is demonstrated by a stochastic car-following task. Experiments indicate that compared with previous RL methods, CCAC improves the performance while guaranteeing safety, with a five times faster convergence rate. It also has 100 times higher online computation efficiency than traditional safety techniques such as stochastic model predictive control.

翻译：在现实世界中,安全对于强化学习(RL)至关重要。机会限制适合于代表随机系统的安全要求。以前受机会限制的RL方法通常会比较低的趋同率,或者只是学习保守政策。在本文中,我们建议一种基于模式的、机会限制的行为者-批评算法,可以有效地学习安全和非保守的政策。不同于现有最优化保守的较低约束方法,CCC直接解决最初的机会限制问题,即目标功能和安全概率与适应加权同时得到优化。为了提高趋同率,CCC利用动态模型的梯度来加速政策优化。CCC的效能表现是执行汽车跟踪任务。实验表明,与以前RL方法相比,CCC在保证安全的同时提高性能,其速度比Stochic模型预测控制等传统安全技术高出100倍。

0

相关内容

优化器

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

76+阅读 · 2021年1月23日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Return-Based Contrastive Representation Learning for Reinforcement Learning

Arxiv

10+阅读 · 2021年2月22日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

MARS: Memory Attention-Aware Recommender System

Arxiv

6+阅读 · 2018年5月18日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

76+阅读 · 2021年1月23日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Return-Based Contrastive Representation Learning for Reinforcement Learning

Arxiv

10+阅读 · 2021年2月22日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

MARS: Memory Attention-Aware Recommender System

Arxiv

6+阅读 · 2018年5月18日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

微信扫码咨询专知VIP会员