Reinforcement learning (RL) is an important field of research in machine learning that is increasingly being applied to complex optimization problems in physics. In parallel, concepts from physics have contributed to important advances in RL with developments such as entropy-regularized RL. While these developments have led to advances in both fields, obtaining analytical solutions for optimization in entropy-regularized RL is currently an open problem. In this paper, we establish a mapping between entropy-regularized RL and research in non-equilibrium statistical mechanics focusing on Markovian processes conditioned on rare events. In the long-time limit, we apply approaches from large deviation theory to derive exact analytical results for the optimal policy and optimal dynamics in Markov Decision Process (MDP) models of reinforcement learning. The results obtained lead to a novel analytical and computational framework for entropy-regularized RL which is validated by simulations. The mapping established in this work connects current research in reinforcement learning and non-equilibrium statistical mechanics, thereby opening new avenues for the application of analytical and computational approaches from one field to cutting-edge problems in the other.
翻译:强化学习是机器学习领域的重要研究方向,越来越多地应用于物理学的复杂优化问题中。同时,来自物理学的概念也为强化学习带来了重大进展,如熵正则化强化学习等。然而,目前在熵正则化强化学习中,获得优化的解析解仍是一个开放性问题。本文建立了熵正则化强化学习与非平衡统计力学的映射关系,重点关注在稀有事件条件下的马尔可夫过程。在长时间极限下,我们应用大偏差理论的方法推导出强化学习马尔可夫决策过程模型中的最优政策和最优动力学的精确解析结果。所得到的结果为熵正则化强化学习提供了一种新的分析和计算框架,并经过了模拟验证。本文所建立的映射关系连接了强化学习和非平衡统计力学领域的最新研究,为将一领域的分析和计算方法应用于另一个领域的前沿问题打开了新的途径。