With the motive of training all the parameters of a neural network, we study why and when one can achieve this by iteratively creating, training, and combining randomly selected subnetworks. Such scenarios have either implicitly or explicitly emerged in the recent literature: see e.g., the Dropout family of regularization techniques, or some distributed ML training protocols that reduce communication/computation complexities, such as the Independent Subnet Training protocol. While these methods are studied empirically and utilized in practice, they often enjoy partial or no theoretical support, especially when applied on neural network-based objectives. In this manuscript, our focus is on overparameterized single hidden layer neural networks with ReLU activations in the lazy training regime. By carefully analyzing $i)$ the subnetworks' neural tangent kernel, $ii)$ the surrogate functions' gradient, and $iii)$ how we sample and combine the surrogate functions, we prove linear convergence rate of the training error -- up to a neighborhood around the optimal point -- for an overparameterized single-hidden layer perceptron with a regression loss. Our analysis reveals a dependency of the size of the neighborhood around the optimal point on the number of surrogate models and the number of local training steps for each selected subnetwork. Moreover, the considered framework generalizes and provides new insights on dropout training, multi-sample dropout training, as well as Independent Subnet Training; for each case, we provide convergence results as corollaries of our main theorem.
翻译:随着培训神经网络所有参数的动机,我们研究为什么和何时可以通过迭代创建、培训和随机合并子网络来实现这一点。这些情景已经隐含或明确出现在最近的文献中:例如,见“正规化技术的辍学家庭”或一些分散的减少通信/剖析复杂性的ML培训协议,如独立子网培训协议。这些方法在实际中经过经验研究和使用,但往往得到部分或没有理论支持,特别是在应用于神经网络目标时。在这份手稿中,我们的重点是以懒惰培训制度中的ReLU启动为重度的单层神经网络。通过仔细分析($)子网络的神经色素内核,或一些分散的MLLLN培训协议,减少通信/剖析的复杂性。我们对这些方法的抽样和合并方式进行了研究,我们证明了培训错误的线性趋同率 -- -- 直至在以神经网络为基础应用。我们的分析显示,每个深度单层的单层层神经神经神经神经网络的反射线, 以及每个深度的深度分析显示我们所考虑的下级培训框架的每个深度, 提供了我们所考虑的下层的深度的下级培训的深度, 。