Network pruning is an effective method to reduce the computational expense of over-parameterized neural networks for deployment on low-resource systems. Recent state-of-the-art techniques for retraining pruned networks such as weight rewinding and learning rate rewinding have been shown to outperform the traditional fine-tuning technique in recovering the lost accuracy (Renda et al., 2020), but so far it is unclear what accounts for such performance. In this work, we conduct extensive experiments to verify and analyze the uncanny effectiveness of learning rate rewinding. We find that the reason behind the success of learning rate rewinding is the usage of a large learning rate. Similar phenomenon can be observed in other learning rate schedules that involve large learning rates, e.g., the 1-cycle learning rate schedule (Smith et al., 2019). By leveraging the right learning rate schedule in retraining, we demonstrate a counter-intuitive phenomenon in that randomly pruned networks could even achieve better performance than methodically pruned networks (fine-tuned with the conventional approach). Our results emphasize the cruciality of the learning rate schedule in pruned network retraining - a detail often overlooked by practitioners during the implementation of network pruning. One-sentence Summary: We study the effective of different retraining mechanisms while doing pruning
翻译:网络修剪是降低超临界神经网络在低资源系统中部署的计算成本的有效方法。最近最先进的再修再修网络技术,如重回缩和学习率回缩,已经证明在恢复丧失的准确性方面超过了传统的微调技术(Renda等人,2020年),但迄今为止尚不清楚这种业绩的计算原理。在这项工作中,我们进行了广泛的实验,以核实和分析学习率回流的异常效果。我们发现,学习率回流成功的原因是使用高学习率。类似的现象可以在涉及高学习率的其他学习率计划中观察到,例如,1周期学习率时间表(Smith等人,2019年),但通过在再培训中利用正确的学习率时间表,我们显示出一种反直觉现象,即随机运行的网络甚至能够取得比有条理的回流回流网络更好的业绩(与常规方法相调)。我们的成果强调学习率的临界性,同时在网络的快速再培训中进行不同的学习率研究:在网络运行中进行不同的详细度的学习率,我们经常通过不同的再培训机制进行不同的研究。