In this note we demonstrate provable convergence of SGD to the global minima of appropriately regularized $\ell_2-$empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates, if they are using adequately smooth and bounded activations like sigmoid and tanh. We build on the results in [1] and leverage a constant amount of Frobenius norm regularization on the weights, along with sampling of the initial weights from an appropriate distribution. We also give a continuous time SGD convergence result that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence loss functions on constant sized neural nets which are "Villani Functions". [1] Bin Shi, Weijie J. Su, and Michael I. Jordan. On learning rates and schr\"odinger operators, 2020. arXiv:2004.06977
翻译:在本文中,我们证明了SGD收敛到 depth $2$ 的神经网络的合适正则化后的 $\ell_2$ - 经验风险的全局最小值 - 对于任意数据和使用足够平滑和有界激活函数(如 sigmoid 和 tanh)的任意数量的门,都成立。我们建立在 [1] 中的结果之上,并在权重上施加一定量的 Frobenius 范数正则化,以及从适当的分布中采样初始权重。我们还给出了一个连续时间的 SGD 收敛结果,该结果也适用于平滑的无界激活函数(如 SoftPlus)。我们的关键思路是展示在具有恒定大小的神经网络上的损失函数是“Villani函数”。[1] Bin Shi,Weijie J. Su 和 Michael I. Jordan。 On learning rates and schr\"odinger operators, 2020. arXiv:2004.06977