Recent development of Deep Reinforcement Learning (DRL) has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces. One specific approach is to deploy neural networks to approximate value functions by minimising the Mean Squared Bellman Error (MSBE) function. Despite great successes of DRL, development of reliable and efficient numerical algorithms to minimise the MSBE is still of great scientific interest and practical demand. Such a challenge is partially due to the underlying optimisation problem being highly non-convex or using incomplete gradient information as done in Semi-Gradient algorithms. In this work, we analyse the MSBE from a smooth optimisation perspective and develop an efficient Approximate Newton's algorithm. First, we conduct a critical point analysis of the error function and provide technical insights on optimisation and design choices for neural networks. When the existence of global minima is assumed and the objective fulfils certain conditions, suboptimal local minima can be avoided when using over-parametrised neural networks. We construct a Gauss Newton Residual Gradient algorithm based on the analysis in two variations. The first variation applies to discrete state spaces and exact learning. We confirm theoretical properties of this algorithm such as being locally quadratically convergent to a global minimum numerically. The second employs sampling and can be used in the continuous setting. We demonstrate feasibility and generalisation capabilities of the proposed algorithm empirically using continuous control problems and provide a numerical verification of our critical point analysis. We outline the difficulties of combining Semi-Gradient approaches with Hessian information. To benefit from second-order information complete derivatives of the MSBE must be considered during training.
翻译:深度增强学习(DRL)最近开发的深度增强学习(DRL)显示,神经网络在解决大型甚至连续状态空间的挑战性问题方面表现优异。一个具体的方法是,通过将中平方贝尔曼错误(MSBE)功能降到最低程度,部署神经网络以接近价值功能。尽管DRL取得了巨大成功,但开发可靠高效的数字算法以最大限度地减少 MSBE 仍具有极大的科学兴趣和实际需求。这一挑战部分是由于以下的优化问题造成的:高度非电解或使用半优级算法中不完全的梯度信息。在这项工作中,我们从平稳的优化角度对 MSBE 进行了分析,并开发了高效的 Apgrime Bellman错误(MSBE) 算法。首先,我们对错误函数进行了临界点分析,并就优化和设计神经网络的选择提出了技术见解选择。当假设存在全球微型模型并达到某些条件时,在使用过度平衡的神经网络中可以避免二度不优化的本地缩略数据。我们首先从牛顿端端端点分析点分析,然后在进行精确度分析,在进行精确度分析时,在进行精确分析时,然后进行精确分析。在进行这种精确分析时,在进行精确分析时,在进行精确分析时,在进行精确分析,在进行这种分析时,在进行精确的数值分析时,在进行。