The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient-based optimization methods. It is generally unclear how to estimate the Lipschitz constant of a complex model. Thus, this paper studies an important problem that may be useful to the broader area of non-convex optimization. The main result provides a local upper bound on the Lipschitz constants of a multi-layer feed-forward neural network and its gradient. Moreover, lower bounds are established as well, which are used to show that it is impossible to derive global upper bounds for the Lipschitz constants. In contrast to previous works, we compute the Lipschitz constants with respect to the network parameters and not with respect to the inputs. These constants are needed for the theoretical description of many step size schedulers of gradient based optimization schemes and their convergence analysis. The idea is both simple and effective. The results are extended to a generalization of neural networks, continuously deep neural networks, which are described by controlled ODEs.
翻译:Lipschitz 常量是分析基于梯度优化方法趋同过程中产生的重要数量,通常不清楚如何估计复杂模型的Lipschitz常量。 因此,本文研究一个可能对非convex优化的更广泛领域有用的重要问题。 主要结果为多层进料神经网络及其梯度的Lipschitz常量提供了局部上限。 此外,还建立了较低界限,用来表明无法为Lipschitz常量得出全球上限。 与以往的工程不同, 我们计算了Lipschitz常量, 与网络参数不同, 而不是与输入值不同。 这些常数对于基于梯度优化计划的许多步数缩排表及其趋同分析的理论描述是必要的。 其想法既简单又有效。 其结果被扩展为神经网络的普遍化, 持续深度神经网络, 由受控的 ODE 所描述 。