In this paper, we formalize the almost sure convergence of $Q$-learning and linear temporal difference (TD) learning with Markovian samples using the Lean 4 theorem prover based on the Mathlib library. $Q$-learning and linear TD are among the earliest and most influential reinforcement learning (RL) algorithms. The investigation of their convergence properties is not only a major research topic during the early development of the RL field but also receives increasing attention nowadays. This paper formally verifies their almost sure convergence in a unified framework based on the Robbins-Siegmund theorem. The framework developed in this work can be easily extended to convergence rates and other modes of convergence. This work thus makes an important step towards fully formalizing convergent RL results. The code is available at https://github.com/ShangtongZhang/rl-theory-in-lean.
翻译:本文基于Mathlib库,利用Lean 4定理证明器,形式化地证明了基于马尔可夫样本的$Q$学习和线性时序差分(TD)学习的几乎必然收敛性。$Q$学习和线性TD算法是强化学习(RL)领域中最早且最具影响力的算法之一。对其收敛性质的研究不仅是RL领域早期发展的主要研究课题,如今也受到越来越多的关注。本文基于Robbins-Siegmund定理,在一个统一框架中形式化验证了它们的几乎必然收敛性。本工作所开发的框架可轻松扩展至收敛速率及其他收敛模式的分析。因此,本研究为实现收敛性RL结果的完全形式化验证迈出了重要一步。代码发布于https://github.com/ShangtongZhang/rl-theory-in-lean。