Transfer learning is a powerful tool enabling model training with limited amounts of data. This technique is particularly useful in real-world problems where data availability is often a serious limitation. The simplest transfer learning protocol is based on ``freezing" the feature-extractor layers of a network pre-trained on a data-rich source task, and then adapting only the last layers to a data-poor target task. This workflow is based on the assumption that the feature maps of the pre-trained model are qualitatively similar to the ones that would have been learned with enough data on the target task. In this work, we show that this protocol is often sub-optimal, and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen. In particular, we make use of a controlled framework to identify the optimal transfer depth, which turns out to depend non-trivially on the amount of available training data and on the degree of source-target task correlation. We then characterize transfer optimality by analyzing the internal representations of two networks trained from scratch on the source and the target task through multiple established similarity measures.
翻译:转让学习是一种强有力的工具,它能提供数量有限的数据。这种技术在现实世界问题中特别有用,因为提供的数据往往是严重的限制。最简单的转让学习协议的基础是“冻结”一个预先经过数据丰富来源任务培训的网络的特征提取层,然后只对最后一层进行调整,以适应数据贫乏的目标任务。这一工作流程所依据的假设是,预先培训模式的特征图在质量上与用目标任务足够数据学习的特征图在质量上相似。在这项工作中,我们表明这一协议往往是次最佳的,如果通过多种既定的类似措施冻结了培训前网络的较小部分,则可以实现最大的绩效收益。我们特别利用一个受控制的框架来确定最佳转移深度,这种深度最终取决于现有培训数据的数量和源目标任务关联的程度。我们然后通过分析从源和目标任务上从零开始训练的两个网络的内部表现,通过多种既定的类似措施,确定转让的最佳性。</s>