Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective. This has opened up the possibility of introducing variational and symplectic methods using geometric integration. In particular, in this paper, we introduce variational integrators which allow us to derive different methods for optimization. Using both, Hamilton's and Lagrange-d'Alembert's principle, we derive two families of respective optimization methods in one-to-one correspondence that generalize Polyak's heavy ball and the well known Nesterov accelerated gradient method, the second of which mimics the behavior of the first reducing the oscillations of classical momentum methods. However, since the systems considered are explicitly time-dependent, the preservation of symplecticity of autonomous systems occurs here solely on the fibers. Several experiments exemplify the result.
翻译:机器学习的许多新发展都与基于梯度的优化方法有关。 最近,这些方法已经用变异角度进行了研究。 这打开了采用几何集成采用变异和间隙方法的可能性。 特别是, 在本文中, 我们引入了变异集成器, 使我们能够得出不同的优化方法。 使用汉密尔顿和拉格兰格- 德阿伦伯特原则, 我们从一对一的通信中得出了两个关于各自优化方法的组合, 这些通信将波拉克的重球和众所周知的内斯特罗夫加速梯度方法普遍化, 其中第二个方法模仿了第一次减少古典动力方法振动的动作。 然而, 由于所考虑的系统明确取决于时间, 维护自主系统的随机性只发生在这里的纤维上。 几个实验将结果举例化 。