Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In particular, this is the case for rectified linear unit (ReLU) networks. In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements. Then, we verify that shallow ReLU networks fit into the new framework. Building on a classification of critical points of the square integral loss of shallow ReLU networks measured against an affine target function, we deduce that gradient descent avoids most saddle points. We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.
翻译:最近运用了动态系统理论,以最优化的方式证明梯度下沉算法避免了所谓的损失功能的严格支撑点。然而,在许多现代机器学习应用程序中,所需规律性条件并不符合要求。特别是,纠正线性单元(ReLU)网络就属于这种情况。在本文中,我们证明相关动态系统结果的一种变体,即中位多元理论,在这种变体中,我们放松了一些规律性要求。然后,我们核实浅线性ReLU网络是否适合纳入新的框架。在对根据直角目标功能测量的浅线性ReLU网络的平方整体损失临界点进行分类的基础上,我们推断,梯度下降避免了大多数临界点。如果初始化效果足够好,我们将继续证明与全球微型模型的趋同,这表现为限制损失的明确阈值。