Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity that a continuous-time analysis buys us. An overarching theme of our paper is providing general conditions under which SGD converges, assuming that GF on the population loss converges. Our main tool to establish this connection is a general converse Lyapunov like theorem, which implies the existence of a Lyapunov potential under mild assumptions on the rates of convergence of GF. In fact, using these potentials, we show a one-to-one correspondence between rates of convergence of GF and geometrical properties of the underlying objective. When these potentials further satisfy certain self-bounding properties, we show that they can be used to provide a convergence guarantee for Gradient Descent (GD) and SGD (even when the paths of GF and GD/SGD are quite far apart). It turns out that these self-bounding assumptions are in a sense also necessary for GD/SGD to work. Using our framework, we provide a unified analysis for GD/SGD not only for classical settings like convex losses, or objectives that satisfy PL / KL properties, but also for more complex problems including Phase Retrieval and Matrix sq-root, and extending the results in the recent work of Chatterjee 2022.
翻译:连续时间分析使我们受益,这是连续时间分析的简单性,我们论文的一个总主题是提供全球资源定位聚合的一般条件,假设全球资源定位在人口损失问题上趋于一致。我们建立这种联系的主要工具是像Teorem那样的Lyapunov(Lyapunov)总体连接,这意味着在GF趋同率的温和假设下存在Lyapunov(Lyapunov)潜力。事实上,利用这些潜力,我们在理解GF的趋同率和基本目标的几何特性之间的一一对一对应。当这些潜力进一步满足某些自我约束性能时,我们表明这些潜力可以用来为GD(GD)和SGD(GD/GD)的复杂联系提供趋同保证,这意味着GF和GD(GD/SGD)的路径在GF的趋同率上存在一种温和的分流潜力。 利用这些潜力将GSG的趋同率率和GL的自我分析结果推向了我们G的自我分析。