Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order method rooted in the Approximate Dynamic Programming. In this vein, we propose a new class of optimizer, DDP Neural Optimizer (DDPNOpt), for training feedforward and convolution networks. DDPNOpt features layer-wise feedback policies which improve convergence and reduce sensitivity to hyper-parameter over existing methods. It outperforms other optimal-control inspired training methods in both convergence and complexity, and is competitive against state-of-the-art first and second order methods. We also observe DDPNOpt has surprising benefit in preventing gradient vanishing. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.
翻译:深神经网络(DNNs)培训是非线性动态系统的最佳控制问题,这一点最近受到相当重视,但算法发展仍然相对有限。在这项工作中,我们尝试从轨迹优化角度重新制定培训程序,以此沿这条线调整培训程序。我们首先显示,培训DNS最广泛使用的算法可以与差异动态程序(DDP)相联系,后者是植根于“近似动态程序”的备受庆祝的二级方法。在这方面,我们提出了一个新的优化者类别,DDP神经优化器(DDPNOPT),用于培训进化和进化网络。DDPNOPt具有从层角度的反馈政策,可以改善趋同,降低对现有方法的超度敏感度。它优于其他最佳控制激励培训方法,在趋同性和复杂性方面都优于其他最佳控制激励培训方法,并且与最先进的第一级和二级方法竞争。我们还观察到DDPNOpt在防止梯度消失方面有惊人的好处。我们的工作开辟了以最佳控制理论为基础进行有原则的计算设计的新途径。