深海网络与样本二次二次曲线过分多参数化的汇合 (On the Convergence of Deep Networks with Sample Quadratic Overparameterization)

The remarkable ability of deep neural networks to perfectly fit training data when optimized by gradient-based algorithms is yet to be fully explained theoretically. Explanations by recent theoretical works rely on the networks to be wider by orders of magnitude than the ones used in practice. In this work, we take a step towards closing the gap between theory and practice. We show that a randomly initialized deep neural network with ReLU activation converges to a global minimum in a logarithmic number of gradient-descent iterations, under a considerably milder condition on its width. Our analysis is based on a novel technique of training a network with fixed activation patterns. We study the unique properties of the technique that allow an improved convergence, and can be transformed at any time to an equivalent ReLU network of a reasonable size. We derive a tight finite-width Neural Tangent Kernel (NTK) equivalence, suggesting that neural networks trained with our technique generalize well at least as good as its NTK, and it can be used to study generalization as well.

翻译：深神经网络在以梯度算法优化时完全适合培训数据的惊人能力尚有待从理论上充分解释。最近的理论工程的解释依赖于网络的广度,其规模要大于实际所使用的规模。在这项工作中,我们迈出了缩小理论与实践之间差距的一步。我们显示,一个随机初始化的深神经网络与RELU的激活相匹配,在一个对数的梯度-白度迭代中,在宽度相当温和的条件下,汇集到一个全球最低值。我们的分析基于一种新颖的技术,即训练一个具有固定激活模式的网络。我们研究能够改进趋同的这一技术的独特性,并随时可以转换成一个相当大小的RELU网络。我们得出了一个紧凑的有限维度NEural Tangent Kernel(NTK)等值, 这表明,经过我们技术普遍化训练的神经网络,至少是良好的NTK, 并且可以用来研究一般化。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

77+阅读 · 2020年7月23日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日