In recent years distributional reinforcement learning has produced many state of the art results. Increasingly sample efficient Distributional algorithms for the discrete action domain have been developed over time that vary primarily in the way they parameterize their approximations of value distributions, and how they quantify the differences between those distributions. In this work we transfer three of the most well-known and successful of those algorithms (QR-DQN, IQN and FQF) to the continuous action domain by extending two powerful actor-critic algorithms (TD3 and SAC) with distributional critics. We investigate whether the relative performance of the methods for the discrete action space translates to the continuous case. To that end we compare them empirically on the pybullet implementations of a set of continuous control tasks. Our results indicate qualitative invariance regarding the number and placement of distributional atoms in the deterministic, continuous action setting.
翻译:近些年来,分布强化学习产生了许多最新成果。 越来越多的不同行动域的高效分布算法在一段时间内得到了发展,主要变化在于它们如何将其价值分布近似值的参数化,以及它们如何量化这些分布之间的差异。 在这项工作中,我们通过向分布批评者推广两种强大的行为者-批评算法(TD3和SAC),将最著名和最成功的三种算法(QR-DQN、IQN和FQF)转移到持续行动域。 我们调查分离行动域方法的相对性能是否转化为持续案例。 为此,我们用经验比较了在一系列连续控制任务执行的圆柱上的情况。我们的结果显示,分配原子的数量和放置在确定性、持续的行动设置中存在质的差异。