Robust Reinforcement Learning tries to make predictions more robust to changes in the dynamics or rewards of the system. This problem is particularly important when the dynamics and rewards of the environment are estimated from the data. In this paper, we approximate the Robust Reinforcement Learning constrained with a $\Phi$-divergence using an approximate Risk-Averse formulation. We show that the classical Reinforcement Learning formulation can be robustified using standard deviation penalization of the objective. Two algorithms based on Distributional Reinforcement Learning, one for discrete and one for continuous action spaces are proposed and tested in a classical Gym environment to demonstrate the robustness of the algorithms.
翻译:强化强化学习试图使预测更加有力,以适应系统动态或回报的变化。当根据数据估计环境的动态和回报时,这一问题尤其重要。在本文中,我们用一种大致风险反常的配方,将强力强化学习的制约程度比喻为$/Phi$-digence。我们表明,传统强化学习的提法可以用对目标的标准偏差惩罚来强化。基于分配强化学习的两种算法,一种用于离散学习,另一种用于连续操作空间,在典型的健身环境里提出并测试,以显示算法的稳健性。