Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight tuning or complex search on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight tuning, termed "lottery jackpots", exist in pre-trained models with unexpanded width. For example, we obtain a lottery jackpot that has only 10% parameters and still reaches the performance of the original dense VGGNet-19 without any modifications on the pre-trained weights on CIFAR-10. Furthermore, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. Based on this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3x cost reduction on the lottery jackpot search while achieving comparable or even better performance. Specifically, our magnitude-based lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 10 searching epochs on ImageNet. Our project is available at https://github.com/lottery-jackpot/lottery-jackpot.
翻译:网络运行是降低网络复杂性的有效方法,可以接受业绩妥协。 现有的研究通过时间耗重的重量调整或对宽度扩大的网络进行复杂搜索,实现了神经网络的广度,这极大地限制了网络运行的应用。 在本文中,我们展示了高性能和稀少的子网络,而没有重力调整的参与,即所谓的“彩票”,存在于未经开发宽度的预培训模型中。例如,我们得到了一个彩票头袋,它只有10%的参数,仍然能够达到原始密集的VGGNet-19的性能,而没有对CIRAR-10的预先训练重量作任何修改。此外,我们观察到,从许多现有裁剪裁标准中得来的稀薄面具与我们彩票头牌的搜索面罩有很大重叠。 其中,在与我们最相似的彩票面罩中,基于规模的彩票彩票“彩虹彩虹彩虹”“彩虹”,我们开始使用基于规模的彩票彩虹彩虹彩虹的彩票头罩,因此至少减少了3x成本,同时实现可比的或更好的业绩。 具体而言,只有10 %的彩票头/ 。