Discrete Soft Actor- Crictic 目标的隐形信封 (Target Entropy Annealing for Discrete Soft Actor-Critic) - 专知论文

会员服务 ·

0

SAC · 离散化 · SOFT · Continuity · Performer ·

2021 年 12 月 6 日

Target Entropy Annealing for Discrete Soft Actor-Critic

翻译：Discrete Soft Actor- Crictic 目标的隐形信封

Yaosheng Xu,Dailin Hu,Litian Liang,Stephen McAleer,Pieter Abbeel,Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $\alpha$, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.

翻译： Soft Actor- Critic (SAC) 被认为是连续动作空间设置中最先进的算法。它使用最大恒星框架来提高效率和稳定性, 并使用超温温度拉格朗术语来调节温度 $\ alpha$, 这决定了该政策应该如何“ 软 ” 。反直觉的是, 经验证据表明 SAC 在离散域中表现不佳。本文中我们调查了这一现象的可能解释, 并提出了目标 Entropy 附表 SAC (TES- SAC), 这是在 SAC 上应用的目标酶参数的一种反射法。目标恒星在温度拉格朗术语中是恒定的, 并且代表离散 SAC 中的目标政策。我们比较了 Atari 2600 游戏的方法与不同的恒定目标 entropy SAC, 并分析我们的时间表是如何影响 SAC 的。

0

相关内容

SAC

SAC：Selected Areas in Cryptography。 Explanation：密码术的选择区。 Publisher：Springer。 SIT：http://dblp.uni-trier.de/db/conf/sacrypt/

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Controlling the Complexity and Lipschitz Constant improves polynomial nets

Arxiv

0+阅读 · 2022年2月10日

Learning in Restless Bandits under Exogenous Global Markov Process

Learning in Restless Bandits under Exogenous Global Markov Process

Arxiv

0+阅读 · 2022年2月10日

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Arxiv

1+阅读 · 2022年2月10日

Shortest Paths without a Map, but with an Entropic Regularizer

Arxiv

0+阅读 · 2022年2月9日

The typical set and entropy in stochastic systems with arbitrary phase space growth

Arxiv

0+阅读 · 2022年2月9日

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

Arxiv

0+阅读 · 2022年2月9日

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

Arxiv

0+阅读 · 2022年2月9日

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

Arxiv

0+阅读 · 2022年2月9日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Controlling the Complexity and Lipschitz Constant improves polynomial nets

Arxiv

0+阅读 · 2022年2月10日

Learning in Restless Bandits under Exogenous Global Markov Process

Learning in Restless Bandits under Exogenous Global Markov Process

Arxiv

0+阅读 · 2022年2月10日

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Arxiv

1+阅读 · 2022年2月10日

Shortest Paths without a Map, but with an Entropic Regularizer

Arxiv

0+阅读 · 2022年2月9日

The typical set and entropy in stochastic systems with arbitrary phase space growth

Arxiv

0+阅读 · 2022年2月9日

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

Arxiv

0+阅读 · 2022年2月9日

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

Arxiv

0+阅读 · 2022年2月9日

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

Arxiv

0+阅读 · 2022年2月9日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员