Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the problem studied in each setting, relying on different assumptions on the distribution, optimization parameters, and network architectures, making it difficult to generalize across different settings. In this work, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.
翻译:虽然学习神经网络的最优化目标是高度非凝固的,但基于梯度的方法在实践中在学习神经网络方面非常成功。这一并列已经导致最近对通过梯度下降训练的神经网络的可证实的保障进行了一些研究。不幸的是,这些作品中的技术往往非常具体地解决在每种环境下研究的问题,依靠对分布、优化参数和网络结构的不同假设,使得难以在不同环境中推广。在这项工作中,我们提议了一个统一的非链化优化框架,用于分析神经网络培训。我们引入了代理性凝固和代理Polyak-Lojasiewicz(PL)不平等的概念,如果原始目标功能产生一种代用梯度方法时暗含最小化的代用目标功能,这些技术往往非常具体,我们表明,在满足代用粘度调或代用极不平等的目标上,可导致对代用目标功能的有效保障。我们进一步表明,通过代用粘度粘合和代用极不平等来统一由梯度下降所训练的神经网络的现有保障。