分销软软软体活性-批评:为解决价值估计错误进行非政策强化学习 (Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors)

In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q-value overestimations. We first discover in theory that learning a distribution function of state-action returns can effectively mitigate Q-value overestimations because it is capable of adaptively adjusting the update stepsize of the Q-value function. Then, a distributional soft policy iteration (DSPI) framework is developed by embedding the return distribution function into maximum entropy RL. Finally, we present a deep off-policy actor-critic variant of DSPI, called DSAC, which directly learns a continuous return distribution by keeping the variance of the state-action returns within a reasonable range to address exploding and vanishing gradient problems. We evaluate DSAC on the suite of MuJoCo continuous control tasks, achieving the state-of-the-art performance.

翻译：在强化学习(RL)中,已知功能近似误差很容易导致Q值高估,从而大大降低政策绩效。本文展示了一种分配软性行为者-critic(DSAC)算法(DSAC)算法(DSAC),这是一个用于持续控制设置的非政策性RL方法,目的是通过减少Q值高估来改进政策绩效。我们首先从理论上发现,学习州-行动回报的分布功能能够有效减轻Q值高估,因为它能够适应调整Q值函数的更新步骤。然后,通过将返回分配函数嵌入最大 entropy RL(DSPI) 。最后,我们展示了DSPI(DSAC) 的深度非政策性行为者-critic 变式(称为DSAC), 通过将州-行动回报的差异保持在合理范围内,以解决爆炸和消亡的梯度问题,直接了解持续的返回分布。我们评估了穆乔科连续控制任务套件的DSAC(DSAC) 持续控制任务,实现状态。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

175+阅读 · 2020年5月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日