We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained on different time scales. By choosing appropriate partitionings we can obtain substantial computational speed-up for transfer learning tasks. We show for applications in vision and NLP that we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting models. We analyze the convergence properties of our multirate scheme and draw a comparison with vanilla SGD. We also discuss splitting choices for the neural network parameters which could enhance generalization performance when neural networks are trained from scratch. A multirate approach can be used to learn different features present in the data and as a form of regularization. Our paper unlocks the potential of using multirate techniques for neural network training and provides several starting points for future work in this area.
翻译:我们提议对神经网络进行多层次培训:将神经网络参数分成“快”和“慢”部分,这些部分在不同的时间尺度上得到培训。通过选择适当的分区,我们可以为转移学习任务获得大量的计算速度。我们在视觉和NLP中显示,在几乎一半的时间里,我们可以微调深神经网络,但不会降低所产生模型的通用性能。我们分析了我们多率方案的趋同特性,并与香草 SGD进行了比较。我们还讨论了神经网络参数的分选,这些参数在神经网络从零开始训练时可以提高一般性能。可以使用多率方法学习数据中的不同特征,并作为一种正规化的形式。我们的论文释放了使用多率技术进行神经网络培训的潜力,并为今后在这一领域的工作提供了几个起点。