We study non-convex subgradient flows for training two-layer ReLU neural networks from a convex geometry and duality perspective. We characterize the implicit bias of unregularized non-convex gradient flow as convex regularization of an equivalent convex model. We then show that the limit points of non-convex subgradient flows can be identified via primal-dual correspondence in this convex optimization problem. Moreover, we derive a sufficient condition on the dual variables which ensures that the stationary points of the non-convex objective are the KKT points of the convex objective, thus proving convergence of non-convex gradient flows to the global optimum. For a class of regular training data distributions such as orthogonal separable data, we show that this sufficient condition holds. Therefore, non-convex gradient flows in fact converge to optimal solutions of a convex optimization problem. We present numerical results verifying the predictions of our theory for non-convex subgradient descent.
翻译:我们从剖面几何和双向角度研究非凝固亚梯度流,以培训两层 ReLU 神经网络。我们把非正统非convex 梯度流的隐含偏差定性为对等的锥形模型的二次曲线正规化。我们然后表明,非convex 亚梯度流的极限点可以通过这个锥形优化问题的原始-双向通信来确定。此外,我们从双重变量中得出一个充分的条件,确保非convex 目标的固定点是锥形目标的KKT点,从而证明非convex 梯度流与全球最佳水平的趋同。对于诸如正统相异的相等一系列定期培训数据分布而言,我们表明这一足够的条件是维持的。因此,非convex 梯度流事实上会与对锥形优化问题的最佳解决办法趋同。我们提出了数字结果,以核实我们对非convex 亚梯度下血统理论的预测。