浅浅神经网络培训与随机蒙面中微子的趋同 (On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons)

With the motive of training all the parameters of a neural network, we study why and when one can achieve this by iteratively creating, training, and combining randomly selected subnetworks. Such scenarios have either implicitly or explicitly emerged in the recent literature: see e.g., the Dropout family of regularization techniques, or some distributed ML training protocols that reduce communication/computation complexities, such as the Independent Subnet Training protocol. While these methods are studied empirically and utilized in practice, they often enjoy partial or no theoretical support, especially when applied on neural network-based objectives. In this manuscript, our focus is on overparameterized single hidden layer neural networks with ReLU activations in the lazy training regime. By carefully analyzing $i)$ the subnetworks' neural tangent kernel, $ii)$ the surrogate functions' gradient, and $iii)$ how we sample and combine the surrogate functions, we prove linear convergence rate of the training error -- up to a neighborhood around the optimal point -- for an overparameterized single-hidden layer perceptron with a regression loss. Our analysis reveals a dependency of the size of the neighborhood around the optimal point on the number of surrogate models and the number of local training steps for each selected subnetwork. Moreover, the considered framework generalizes and provides new insights on dropout training, multi-sample dropout training, as well as Independent Subnet Training; for each case, we provide convergence results as corollaries of our main theorem.

翻译：随着培训神经网络所有参数的动机,我们研究为什么和何时可以通过迭代创建、培训和随机合并子网络来实现这一点。这些情景已经隐含或明确出现在最近的文献中:例如,见“正规化技术的辍学家庭”或一些分散的减少通信/剖析复杂性的ML培训协议,如独立子网培训协议。这些方法在实际中经过经验研究和使用,但往往得到部分或没有理论支持,特别是在应用于神经网络目标时。在这份手稿中,我们的重点是以懒惰培训制度中的ReLU启动为重度的单层神经网络。通过仔细分析($)子网络的神经色素内核,或一些分散的MLLLN培训协议,减少通信/剖析的复杂性。我们对这些方法的抽样和合并方式进行了研究,我们证明了培训错误的线性趋同率 -- -- 直至在以神经网络为基础应用。我们的分析显示,每个深度单层的单层层神经神经神经神经网络的反射线, 以及每个深度的深度分析显示我们所考虑的下级培训框架的每个深度, 提供了我们所考虑的下层的深度的下级培训的深度, 。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

专知会员服务

39+阅读 · 2020年11月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》