与基于梯度优化应用的消散间流整合 (On dissipative symplectic integration with applications to gradient-based optimization)

Recently, continuous-time dynamical systems have proved useful in providing conceptual and quantitative insights into gradient-based optimization, widely used in modern machine learning and statistics. An important question that arises in this line of work is how to discretize the system in such a way that its stability and rates of convergence are preserved. In this paper we propose a geometric framework in which such discretizations can be realized systematically, enabling the derivation of "rate-matching" algorithms without the need for a discrete convergence analysis. More specifically, we show that a generalization of symplectic integrators to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error. Moreover, such methods preserve a shadow Hamiltonian despite the absence of a conservation law, extending key results of symplectic integrators to nonconservative cases. Our arguments rely on a combination of backward error analysis with fundamental results from symplectic geometry. We stress that although the original motivation for this work was the application to optimization, where dissipative systems play a natural role, they are fully general and not only provide a differential geometric framework for dissipative Hamiltonian systems but also substantially extend the theory of structure-preserving integration.

翻译：最近,事实证明,持续时间动态系统有助于对现代机器学习和统计中广泛使用的梯度优化提供概念和数量方面的深刻认识,现代机器学习和统计中广泛使用的梯度优化。在这一工作线上出现的一个重要问题是,如何将系统分解,使其稳定和趋同率得以保持。在本文件中,我们提议了一个几何框架,使这种分化能够系统地实现,从而能够得出“比率匹配”的算法,而无需进行离散的趋同分析。更具体地说,我们表明,对非保守性、特别是分解性汉密尔顿系统进行抽调,能够保持趋同率,达到受控制的错误。此外,尽管没有保护法,但这类方法仍保留一个影子汉密尔顿人,将这种分解器的关键结果扩大到非保守性案例。我们的论点依赖于将后向错误分析与分解性几何测法的基本结果结合起来。我们强调,尽管这项工作的最初动机是应用优化,使分解系统发挥自然作用,特别是分解性汉密尔顿系统的作用,但它们只是一种完全不具有一般和基本保持的地理分化的理论结构。