Training Deep Neural Networks (DNNs) is still highly time-consuming and compute-intensive. It has been shown that adapting a pretrained model may significantly accelerate this process. With a focus on classification, we show that current fine-tuning techniques make the pretrained models catastrophically forget the transferred knowledge even before anything about the new task is learned. Such rapid knowledge loss undermines the merits of transfer learning and may result in a much slower convergence rate compared to when the maximum amount of knowledge is exploited. We investigate the source of this problem from different perspectives and to alleviate it, introduce Fast And Stable Task-adaptation (FAST), an easy to apply fine-tuning algorithm. The paper provides a novel geometric perspective on how the loss landscape of source and target tasks are linked in different transfer learning strategies. We empirically show that compared to prevailing fine-tuning practices, FAST learns the target task faster and forgets the source task slower.
翻译:深神经网络培训(DNNS)仍然非常耗时,而且需要大量计算。已经表明,修改预先培训的模式可能会大大加快这一进程。以分类为重点,我们表明,目前的微调技术使预先培训的模式在学习新任务之前就灾难性地忘记了所转让的知识。这种迅速的知识损失破坏了转让学习的优点,并可能导致与利用最大知识量相比,趋同速度慢得多。我们从不同角度调查这一问题的根源并减轻它,引入快速和稳定任务适应(FAST),这是易于应用微调算法的简单易行。本文提供了一种新型的几何角度,说明来源和目标任务的损失场景是如何在不同的转移学习战略中联系在一起的。我们从经验上表明,与流行的微调做法相比,FAST学会更快地完成目标任务,而忘记源任务的速度则较慢。