(Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low computational overhead, without needing to know the true winning tickets that emerge after the full training. Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy. Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets, and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 4.7x energy savings while maintaining comparable or even better accuracy, demonstrating a promising and easily adopted method for tackling cost-prohibitive deep network training. Code available at https://github.com/RICE-EIC/Early-Bird-Tickets.
翻译:(Frankle & Carbin, 2019年, Frankle & Carbin 和 Carbin, 2019年) 显示,在密集、随机初始化的网络中,有胜出票(小型但关键的子网络),可以单独培训,在类似的迭代中实现与后者相似的相近理解。然而,确定这些胜出票仍需要昂贵的火车-春速再培训流程,限制其实际效益。在本文中,我们第一次发现胜出票可以在最早期的培训阶段确定,我们通过低成本培训计划(例如早期停机和低精度培训),以高学习率的方式将胜出票(小型但关键的子网络)命名为早鸟(小型但关键的子网络 ) 。 我们对EB 票的发现与最近报告的意见一致,即神经网络的关键连接模式是早期出现。 此外,我们提出了一个遮蔽距离指标,可以用来识别低计算票,而无需了解拟议全面培训后产生的真正赢出票。 最后,我们利用EB票的存在和拟议的遮罩距离来发展高效的培训方法,这需要先确定EB的更精确的机票。