We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution which is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on i.i.d. data with stochastic gradient descent under the widely-used Xavier initialization.
翻译:我们证明,一个受过Q-学习算法培训的单层神经网络随着模型的大小和培训步骤数目的增多,在分布到随机的普通差异方程式时会集中到一个随机的普通差异方程式中。对限制差异方程式的分析表明,它有一个独特的固定式解决方案,这是Bellman方程式的解决方案,从而最佳地控制了这一问题。此外,我们还研究限制差异方程式与固定方程式的趋同。作为我们分析的副产品,我们通过在广泛使用的Xavier初始化系统下进行与随机梯度梯度下降数据的培训,获得了单层神经方程式的有限行为。