Given a dense shallow neural network, we focus on iteratively creating, training, and combining randomly selected subnetworks (surrogate functions), towards training the full model. By carefully analyzing $i)$ the subnetworks' neural tangent kernel, $ii)$ the surrogate functions' gradient, and $iii)$ how we sample and combine the surrogate functions, we prove linear convergence rate of the training error -- within an error region -- for an overparameterized single-hidden layer perceptron with ReLU activations for a regression task. Our result implies that, for fixed neuron selection probability, the error term decreases as we increase the number of surrogate models, and increases as we increase the number of local training steps for each selected subnetwork. The considered framework generalizes and provides new insights on dropout training, multi-sample dropout training, as well as Independent Subnet Training; for each case, we provide corresponding convergence results, as corollaries of our main theorem.
翻译:鉴于一个密集的浅色神经网络,我们侧重于迭代地创建、培训和将随机选择的子网络(覆盖功能)结合到整个模型的培训中。通过仔细分析子网络神经相干内核的美元、代方函数梯度的美元、代方函数梯度的美元、代方函数的美元,以及我们如何取样和合并代方函数的美元,我们证明在一个错误区域里,对一个过于分辨的单层受体和ReLU启动回归任务,存在着培训错误的线性趋同率。我们的结果意味着,对于固定的神经选择概率而言,错误术语会随着我们增加代方模型的数量而减少,而随着我们增加每个选定子网络的本地培训步骤的数量而增加。考虑的框架对辍学培训、多模组辍学培训以及独立子网络培训进行了概括和新的洞察;对于每一个案例,我们提供了相应的趋同结果,作为我们主要理论的卷轴。