We derive and solve an ``Equation of Motion'' (EoM) for deep neural networks (DNNs), a differential equation that precisely describes the discrete learning dynamics of DNNs. Differential equations are continuous but have played a prominent role even in the study of discrete optimization (gradient descent (GD) algorithms). However, there still exist gaps between differential equations and the actual learning dynamics of DNNs due to discretization error. In this paper, we start from gradient flow (GF) and derive a counter term that cancels the discretization error between GF and GD. As a result, we obtain EoM, a continuous differential equation that precisely describes the discrete learning dynamics of GD. We also derive discretization error to show to what extent EoM is precise. In addition, we apply EoM to two specific cases: scale- and translation-invariant layers. EoM highlights differences between continuous-time and discrete-time GD, indicating the importance of the counter term for a better description of the discrete learning dynamics of GD. Our experimental results support our theoretical findings.
翻译:我们为深神经网络(DNNs)得出并解决了“运动平方程式”的“深度神经网络(EoM)”这一差异方程式,它准确地描述了DNS的离散学习动态。不同的方程式是连续的,但甚至在对离散优化(梯度下位算法)的研究中也发挥了突出的作用。然而,由于离散错误,DNS的差别方程式和实际学习动态之间仍然存在差距。在本文中,我们从梯度流(GF)开始,得出一个对应术语,取消GF和GD之间的离散错误。因此,我们得到了EoM,这是一个连续的差别方程式,准确地描述了GD的离散学习动态。我们还得出了离散化错误,以显示EOM的准确程度。此外,我们将EOM应用于两个具体案例:比例和翻译差异层。EOM强调连续时间和离散时间GD之间的差异,表明反词对于更好地描述GD离散学习动态的重要性。我们的实验结果支持我们的理论结论。</s>