In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.
翻译:在本文中,我们对双Q学习和Q学习的无症状的中度误差进行理论比较。我们的结果基于基于Lyapunov方程式的线性随机近似值分析,并适用于表格设置和线性函数近近似值,条件是最佳政策是独特的,算法是趋同的。我们表明,如果双Q学习使用Q学习的学习速率和两个估计者的平均数的两倍,双Q学习的无症状中度误差与Q学习完全相等。我们还用模拟提出这一理论观察的一些实际影响。