In this note we demonstrate provable convergence of SGD to the global minima of appropriately regularized $\ell_2-$empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates, if they are using adequately smooth and bounded activations like sigmoid and tanh. We build on the results in [1] and leverage a constant amount of Frobenius norm regularization on the weights, along with sampling of the initial weights from an appropriate distribution. We also give a continuous time SGD convergence result that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence loss functions on constant sized neural nets which are "Villani Functions".
翻译:在本说明中,我们展示了SGD与适当规范化$@ell_2$2美元深度经验风险全球微型的可证实的趋同性,这种风险包括任意数据和任何门,如果它们使用充分顺畅和捆绑的活性,如Sigmoid和tanh。我们以[1]的结果为基础,利用对重量的不断量的Frobenius规范规范规范化,同时从适当分布中取样初始重量。我们还给出了SGD连续时间趋同结果,该结果也适用于无约束的平稳活性,如SoftPlus。我们的关键想法是显示“Villani功能”的恒定尺寸神经网的存在功能。