In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order information whose computational complexity is in between the computation of the gradient and the one of the Hessian-vector product. If (1C,1M) represents respectively the computational time and memory footprint of the gradient method, the new technique increase the overall cost to either (1.5C,2M) or (2C,1M). This rescaling has the appealing characteristic of having a natural interpretation, it allows the practitioner to choose between exploration of the parameters set and convergence of the algorithm. The rescaling is adaptive, it depends on the data and on the direction of descent. The numerical experiments highlight the different exploration/convergence regimes.
翻译:在优化深神经网络的背景下,我们建议使用一种新的自动区分技术来调整学习率,这种技术依赖于计算 ~em 曲率,这是计算梯度和赫森矢量器产品之间计算复杂程度的第二顺序信息。如果(1C,1M) 分别代表梯度方法的计算时间和记忆足迹,则新的技术将总成本提高到1.5C,2M 或(2C,1M) 。这种调整具有自然解释的吸引力特征,它允许开业者在对参数集的探索和算法的趋同之间做出选择。调整是适应性的,取决于数据和血统方向。数字实验突出了不同的勘探/趋同制度。