Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This paper develops a new Q-learning algorithm that converges when linear function approximation is used. We prove that simply adding an appropriate regularization term ensures convergence of the algorithm. We prove its stability using a recent analysis tool based on switching system models. Moreover, we experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge. We also provide an error bound on the solution where the algorithm converges.
翻译:Q- 学习是在强化学习社区中广泛使用的算法。 在查看表格设置下, 其趋同已经很牢固。 但是, 其行为在线性函数近似情况下是已知的不稳定的。 本文开发了一种新的Q- 学习算法, 当使用线性函数近似时会趋同。 我们证明仅仅添加一个适当的正规化术语就能确保算法的趋同。 我们使用基于切换系统模型的最新分析工具来证明其稳定性。 此外, 我们实验性地显示, 它在以线性函数近似法进行Q- 学习的环境下会汇合。 我们还在算法趋同的解决方案上出错 。