We analyze the convergence of the averaged stochastic gradient descent for overparameterized two-layer neural networks for regression problems. It was recently found that a neural tangent kernel (NTK) plays an important role in showing the global convergence of gradient-based methods under the NTK regime, where the learning dynamics for overparameterized neural networks can be almost characterized by that for the associated reproducing kernel Hilbert space (RKHS). However, there is still room for a convergence rate analysis in the NTK regime. In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK. Moreover, we show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate through a smooth approximation of a ReLU network under certain conditions.
翻译:我们分析过分的两层神经网络平均悬浮梯度下降对于回归问题的趋同性双层神经网络的趋同性,最近发现神经相正心内核(NTK)在显示NTK制度下基于梯度的方法的全球趋同性方面发挥着重要作用,在NTK制度下,超分度神经网络的学习动态几乎可以用相关的复制内核希尔伯特空间(RKHS)的学习动态为特征。然而,在NTK制度中,仍然有进行趋同率分析的空间。在本研究中,我们表明,平均相向性梯度下降通过利用目标功能的复杂性和与NTK相联系的RKHS,可以达到最小最佳趋同率,同时利用全球趋同性保证,利用目标功能的复杂性和与NTK相联系的RKHS。此外,我们表明,NTK规定的RELU网络的目标功能可以在某些条件下通过RELU网络的顺利接近而以最佳趋同率学习。