Deep Q-Learning is an important reinforcement learning algorithm, which involves training a deep neural network, called Deep Q-Network (DQN), to approximate the well-known Q-function. Although wildly successful under laboratory conditions, serious gaps between theory and practice as well as a lack of formal guarantees prevent its use in the real world. Adopting a dynamical systems perspective, we provide a theoretical analysis of a popular version of Deep Q-Learning under realistic and verifiable assumptions. More specifically, we prove an important result on the convergence of the algorithm, characterizing the asymptotic behavior of the learning process. Our result sheds light on hitherto unexplained properties of the algorithm and helps understand empirical observations, such as performance inconsistencies even after training. Unlike previous theories, our analysis accommodates state Markov processes with multiple stationary distributions. In spite of the focus on Deep Q-Learning, we believe that our theory may be applied to understand other deep learning algorithms
翻译:深Q学习是一种重要的强化学习算法,它涉及训练一个深神经网络,称为深Q网络(DQN),以近似众所周知的Q功能。尽管在实验室条件下非常成功,但理论和实践之间的严重差距以及缺乏正式保障严重缺乏,阻碍了其在现实世界的使用。我们采用了动态系统视角,根据现实和可核查的假设,对深Q学习的流行版本进行了理论分析。更具体地说,我们证明算法趋同的一个重要结果,它体现了学习过程的无物理行为。我们的结果揭示了算法迄今无法解释的特性,有助于理解经验性观察,例如即使在培训之后,业绩不一致。与以往的理论不同,我们的分析顾及了具有多种固定分布的马可夫进程。尽管我们注重深Q学习,但我们认为我们的理论可能被用于理解其他深学习算法。