We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained simultaneously using different learning rates. By choosing appropriate partitionings we can obtain large computational speed-ups for transfer learning tasks. We show that for various transfer learning applications in vision and NLP we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting model. We also discuss other splitting choices for the neural network parameters which are beneficial in enhancing generalization performance in settings where neural networks are trained from scratch. Finally, we propose an additional multirate technique which can learn different features present in the data by training the full network on different time scales simultaneously. The benefits of using this approach are illustrated for ResNet architectures on image data. Our paper unlocks the potential of using multirate techniques for neural network training and provides many starting points for future work in this area.
翻译:我们提议对神经网络进行多层次培训:将神经网络参数分割成“快”和“慢”部分,这些部分同时使用不同的学习率进行培训。通过选择适当的分区,我们可以为转移学习任务获得巨大的计算速度。我们表明,对于视觉和NLP中的各种传输学习应用,我们可以在几乎一半的时间里微调深层神经网络,同时不降低所产生模型的概括性能。我们还讨论神经网络参数的其他分解选择,这些选择有助于在神经网络从零开始接受培训的情况下提高一般化性能。最后,我们提议了另外一种多率技术,通过在不同的时间尺度上培训整个网络来学习数据中存在的不同特征。使用这种方法的好处在图像数据的ResNet结构中得到了说明。我们的论文释放了使用多速技术进行神经网络培训的潜力,并为今后在这一领域的工作提供了许多起点。