We investigate the generalization and optimization of $k$-homogeneous shallow neural-network classifiers in the interpolating regime. The study focuses on analyzing the performance of the model when it is capable of perfectly classifying the input data with a positive margin $\gamma$. When using gradient descent with logistic-loss minimization, we show that the training loss converges to zero at a rate of $\tilde O(1/\gamma^{2/k} T)$ given a polylogarithmic number of neurons. This suggests that gradient descent can find a perfect classifier for $n$ input data within $\tilde{\Omega}(n)$ iterations. Additionally, through a stability analysis we show that with $m=\Omega(\log^{4/k} (n))$ neurons and $T=\Omega(n)$ iterations, the test loss is bounded by $\tilde{O}(1/\gamma^{2/k} n)$. This is in contrast to existing stability results which require polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that are similar to those in the convex setting of linear logistic regression.
翻译:我们调查了内插制度中以美元为均匀的浅神经网络分类器的通用和优化。 研究的重点是分析模型在能够以正差对输入数据进行完美分类时的性能。 在使用梯度下降以后勤损失最小化时, 我们显示, 在多数神经元的多数中, 培训损失以美元( 1/\ gama ⁇ 2/k} T) 的速率为零。 这表明, 梯度下降可以在 $\ tilde\ Omega} (n) 重迭中为美元输入数据找到完美的分类器。 此外, 通过稳定分析, 我们显示, 在使用 $\\ omega (\ log_ 4/ k} (n) 时, 以 美元和 $ $T ⁇ Omega (n) 为单位, 测试损失由 $\ tilde{conde{O} (1/\ gammama_ 2/ k} n。 这与现有的稳定性结果不同, 需要多线度宽度和生成 直线性( iralx) Qalizalizalizalizalizalizalizalizalizal) 分析结果, 中, 至Centralizalizalizalizaliz 。 至Central- 至 Crex 等 至 至 等新的自我分析。