We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the step size. We prove energy-dependent convergence rates of AEGD for both non-convex and convex objectives, which for a suitably small step size recovers desired convergence rates for the batch gradient descent. We also provide an energy-dependent bound on the stationary convergence of AEGD in the stochastic non-convex setting. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for a large variety of optimization problems: it is robust with respect to initial data, capable of making rapid initial progress. The stochastic AEGD shows comparable and often better generalization performance than SGD with momentum for deep neural networks.
翻译:我们建议AEGD, 这是一种基于动态更新的能源变量, 以一阶梯度为基础优化非电流目标功能的新算法。 这种方法被证明是无条件的能源稳定, 不论步骤大小。 我们证明AEGD对于非电流和电流目标都是依赖能源的趋同率, 适当小步数可以恢复批量梯度下降所需的趋同率 。 我们还为AEGD在随机非电流设置中的固定组合提供了一种以能源为基础的约束。 该方法可以直接实施,不需要对超参数进行微调 。 实验结果显示, AEGD在大量优化问题上效果良好: 它在初始数据方面很健全,能够取得快速的初步进展 。 随机的AEGD 显示比SGD在深度神经网络上具有动力的可比较和通常更好的概括性表现 。