Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the particular setup in each problem, making it difficult to generalize across different settings. To address this drawback in the literature, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that gradient descent on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.
翻译:虽然学习神经网络的最优化目标是高度非凝固的,但基于梯度的方法在实践中在学习神经网络方面非常成功。这一并列已经导致最近对通过梯度下降训练的神经网络的可证实的保障进行了一些研究。不幸的是,这些作品中的技术往往非常具体地针对每个问题的特定设置,因此难以在不同环境中加以推广。为了解决文献中的这一缺陷,我们提议为分析神经网络培训建立一个统一的非固态优化框架。我们引入了代理性固化和代理Polyak-Lojasiewicz(PL)不平等的概念,如果原始目标功能产生一种代用梯度方法时暗含最小的代用目标功能,则这些概念就令人满意。我们表明,在满足代用粘合度或代用PL不平等的目标上梯度的梯度,可以有效保障代用目标功能。我们进一步表明,通过代用粘固度和代用 PL的不平等,可以统一由梯度下降所训练的神经网络现有的许多保障。