In distributional reinforcement learning not only expected returns but the complete return distributions of a policy is taken into account. The return distribution for a fixed policy is given as the fixed point of an associated distributional Bellman operator. In this note we consider general distributional Bellman operators and study existence and uniqueness of its fixed points as well as their tail properties. We give necessary and sufficient conditions for existence and uniqueness of return distributions and identify cases of regular variation. We link distributional Bellman equations to multivariate distributional equations of the form $\textbf{X} =_d \textbf{A}\textbf{X} + \textbf{B}$, where $\textbf{X}$ and $\textbf{B}$ are $d$-dimensional random vectors, $\textbf{A}$ a random $d\times d$ matrix and $\textbf{X}$ and $(\textbf{A},\textbf{B})$ are independent. We show that any fixed-point of a distributional Bellman operator can be obtained as the vector of marginal laws of a solution to such a multivariate distributional equation. This makes the general theory of such equations applicable to the distributional reinforcement learning setting.
翻译:在分配强化学习中,不仅考虑到预期的返回,而且考虑到政策的完整返回分布。固定政策的返回分布作为相关分配 Bellman 操作员的固定点。 在本说明中, 我们考虑通用分配 Bellman 操作员, 研究其固定点及其尾部特性的存在和独特性。 我们为返回分布的存在和独特性提供了必要和充分的条件, 并找出经常变异的情况。 我们将分配的Bellman 方程式与表格$\ textbf{X} 的多变量分布方程连接起来。 \ d\ textbf{X} +\ textbf{X} +\ textbf{B} $, 其中, $\ textbf{X} 和 $\ textbf{B} 考虑其固定点的存在和独特性, 以及其尾端属性。 我们显示, 任何固定的分发方程分配方程的公式, 都可以作为通用分配方程式的公式 。