Transferability estimation has been an essential tool in selecting a pre-trained model and the layers in it for transfer learning, to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. To this end, we propose a simple, efficient, and effective transferability measure named TransRate. Through a single pass over examples of a target task, TransRate measures the transferability as the mutual information between features of target examples extracted by a pre-trained model and their labels. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. From the perspective of feature representation, the resulting TransRate evaluates both completeness (whether features contain sufficient information of a target task) and compactness (whether features of each class are compact enough for good generalization) of pre-trained features. Theoretically, we have analyzed the close connection of TransRate to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 32 pre-trained models and 16 downstream tasks.
翻译:现有估算算法要么需要就目标任务进行密集培训,要么在评估不同层次之间的可转让性方面遇到困难。为此,我们提议了一个简单、高效和有效的可转让性措施,名为TransRate。通过对目标任务实例的单次比试,TransRate测量了可转让性,这是通过事先培训模式及其标签提取的目标示例特征之间的相互信息。我们通过采用编码率来克服高效的相互信息估计的挑战,这种编码率是取代诱导的有效替代方法。从特征代表的角度来看,由此产生的TransRate评估了目标任务的完整性(是否包含充分的信息)和紧凑性(每个类别的特点是否足够紧凑,以便很好地概括培训前任务)。理论上,我们分析了TransRate在转让学习后与绩效之间的密切联系。尽管代码有10行非常简单, TransRate在对32个培训前模式和16个下游任务进行了非常广泛的评价。