CASA:政策改进和政策评价阶段之间的桥梁 (CASA: A Bridge Between Gradient of Policy Improvement and Policy Evaluation) - 专知论文

会员服务 ·

0

CASA · 策略评估 · 策略改进 · 近似误差 · 估计/估计量 ·

2021 年 5 月 27 日

CASA: A Bridge Between Gradient of Policy Improvement and Policy Evaluation

翻译：CASA:政策改进和政策评价阶段之间的桥梁

Changnan Xiao,Haosen Shi,Jiajun Fan,Shihong Deng

This paper introduces a novel design of model-free reinforcement learning, CASA, Critic AS an Actor. CASA follows the actor-critic framework that estimates state-value, state-action-value and policy simultaneously. We prove that CASA integrates a consistent path for the policy evaluation and the policy improvement, which completely eliminates the gradient conflict between the policy improvement and the policy evaluation. The policy evaluation is equivalent to a compensational policy improvement, which alleviates the function approximation error, and is also equivalent to an entropy-regularized policy improvement, which prevents the policy from being trapped into a suboptimal solution. Building on this design, an expectation-correct Doubly Robust Trace is introduced to learn state-value and state-action-value, and the convergence is guaranteed. Our experiments show that the design achieves State-Of-The-Art on Arcade Learning Environment.

翻译：本文介绍了无模型强化学习的新设计,即CASA、CRit AS As a Actor。CASA遵循同时估计国家价值、国家行动价值和政策的行为者-批评框架。我们证明,CASA结合了政策评价和政策改进的一致道路,从而完全消除了政策改进和政策评价之间的梯度冲突。政策评价相当于补偿性政策改进,这减轻了功能近似错误,也相当于对政策进行昆虫常规化改进,防止政策陷入不理想的解决方案。在这个设计的基础上,引入了一种对期望的正确的Doubly Robust Trace,以学习国家价值和州行动价值,并保证了这种趋同。我们的实验表明,设计实现了Arcade学习环境的“国家-地方-艺术”设计。

1

相关内容

CASA

国际计算机动画和社会代理国际会议（CASA ）是世界上最古老的计算机动画和社交代理国际会议。会议主题包括但不限于计算机动画，虚拟代理，社交代理，虚拟现实和增强现实以及可视化。官网地址：http://dblp.uni-trier.de/db/conf/ca/

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

A Ranking Approach to Fair Classification

Arxiv

0+阅读 · 2021年7月16日

Online Evaluation Methods for the Causal Effect of Recommendations

Online Evaluation Methods for the Causal Effect of Recommendations

Arxiv

0+阅读 · 2021年7月15日

GMAC: A Distributional Perspective on Actor-Critic Framework

GMAC: A Distributional Perspective on Actor-Critic Framework

Arxiv

0+阅读 · 2021年7月15日

Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks

Arxiv

0+阅读 · 2021年7月14日

Centralized Model and Exploration Policy for Multi-Agent RL

Arxiv

0+阅读 · 2021年7月14日

Fast Parallel-in-Time Quasi-Boundary Value Methods for Backward Heat Conduction Problems

Arxiv

0+阅读 · 2021年7月13日

Label Embedded Dictionary Learning for Image Classification

Label Embedded Dictionary Learning for Image Classification

Arxiv

6+阅读 · 2019年3月7日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Improving GAN Training via Binarized Representation Entropy (BRE) Regularization

Arxiv

4+阅读 · 2018年5月9日

Improved Image Captioning via Policy Gradient optimization of SPIDEr

Arxiv

6+阅读 · 2018年3月12日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

A Ranking Approach to Fair Classification

Arxiv

0+阅读 · 2021年7月16日

Online Evaluation Methods for the Causal Effect of Recommendations

Online Evaluation Methods for the Causal Effect of Recommendations

Arxiv

0+阅读 · 2021年7月15日

GMAC: A Distributional Perspective on Actor-Critic Framework

GMAC: A Distributional Perspective on Actor-Critic Framework

Arxiv

0+阅读 · 2021年7月15日

Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks

Arxiv

0+阅读 · 2021年7月14日

Centralized Model and Exploration Policy for Multi-Agent RL

Arxiv

0+阅读 · 2021年7月14日

Fast Parallel-in-Time Quasi-Boundary Value Methods for Backward Heat Conduction Problems

Arxiv

0+阅读 · 2021年7月13日

Label Embedded Dictionary Learning for Image Classification

Label Embedded Dictionary Learning for Image Classification

Arxiv

6+阅读 · 2019年3月7日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Improving GAN Training via Binarized Representation Entropy (BRE) Regularization

Arxiv

4+阅读 · 2018年5月9日

Improved Image Captioning via Policy Gradient optimization of SPIDEr

Arxiv

6+阅读 · 2018年3月12日

微信扫码咨询专知VIP会员