The lottery ticket hypothesis (LTH) claims that randomly-initialized, dense neural networks contain (sparse) subnetworks that, when trained an equal amount in isolation, can match the dense network's performance. Although LTH is useful for discovering efficient network architectures, its three-step process -- pre-training, pruning, and re-training -- is computationally expensive, as the dense model must be fully pre-trained. Luckily, "early-bird" tickets can be discovered within neural networks that are minimally pre-trained, allowing for the creation of efficient, LTH-inspired training procedures. Yet, no theoretical foundation of this phenomenon exists. We derive an analytical bound for the number of pre-training iterations that must be performed for a winning ticket to be discovered, thus providing a theoretical understanding of when and why such early-bird tickets exist. By adopting a greedy forward selection pruning strategy, we directly connect the pruned network's performance to the loss of the dense network from which it was derived, revealing a threshold in the number of pre-training iterations beyond which high-performing subnetworks are guaranteed to exist. We demonstrate the validity of our theoretical results across a variety of architectures and datasets, including multi-layer perceptrons (MLPs) trained on MNIST and several deep convolutional neural network (CNN) architectures trained on CIFAR10 and ImageNet.
翻译:彩票假设(LTH)声称,随机初始的、密集的神经网络包含(粗糙的)子网络,如果在隔离的情况下培训同等数量,则可以与密集网络的性能相匹配。虽然LTH对发现高效的网络结构有用,但其三步过程 -- -- 训练前、剪裁和再培训 -- -- 计算成本很高,因为密集的模型必须完全预先培训。幸运的是,“幼鸟”的票可以在经过最起码培训的神经网络中发现,从而能够创建高效的、受LTH启发的培训程序。然而,这一现象没有理论基础。我们为必须完成的预培训迭代数进行了分析,以发现胜选的机票,从而提供了对何时和为什么存在这种早期鸟票的理论理解。通过采用贪婪的远端选择操纵战略,我们直接将螺旋网络的性能与它所生成的密度网络的丢失联系起来,从而能够揭示出超过高水平的、受LTHA启发的培训前培训的训练前网络的门槛数量,在高水平、经过训练的低级网络结构中,包括经过训练的多级的机级结构中,我们展示的多级结构结构中,展示了几个结构。