In distributional reinforcement learning not only expected returns but the complete return distributions of a policy are taken into account. The return distribution for a fixed policy is given as the solution of an associated distributional Bellman equation. In this note we consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions. We give necessary and sufficient conditions for existence and uniqueness of return distributions and identify cases of regular variation. We link distributional Bellman equations to multivariate affine distributional equations. We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation. This makes the general theory of such equations applicable to the distributional reinforcement learning setting.
翻译:在分配强化学习中,不仅考虑到预期的回报,而且考虑到政策的全部回报分布。固定政策的回报分布作为相关分配的Bellman方程式的解决方案。在本说明中,我们考虑了一般分配的Bellman方程式,研究其解决方案的存在和独特性以及返回分布的尾项特性。我们为返回分布的存在和独特性提供了必要和充分的条件,并确定了经常变化的情况。我们将分配的Bellman方程式与多种变式的方程式的平衡分布分布分布分配等式联系起来。我们表明,任何分配的Bellman方程式的解决方案都可以作为多种变式的平衡方程式的边际法矢量,作为多方形分布式分配分配方程式的解决方案。这使得这种方程式的一般理论适用于分配强化学习设置。