Deep transfer learning has been widely used for knowledge transmission in recent years. The standard approach of pre-training and subsequently fine-tuning, or linear probing, has shown itself to be effective in many down-stream tasks. Therefore, a challenging and ongoing question arises: how to quantify cross-task transferability that is compatible with transferred results while keeping self-consistency? Existing transferability metrics are estimated on the particular model by conversing source and target tasks. They must be recalculated with all existing source tasks whenever a novel unknown target task is encountered, which is extremely computationally expensive. In this work, we highlight what properties should be satisfied and evaluate existing metrics in light of these characteristics. Building upon this, we propose Principal Gradient Expectation (PGE), a simple yet effective method for assessing transferability across tasks. Specifically, we use a restart scheme to calculate every batch gradient over each weight unit more than once, and then we take the average of all the gradients to get the expectation. Thus, the transferability between the source and target task is estimated by computing the distance of normalized principal gradients. Extensive experiments show that the proposed transferability metric is more stable, reliable and efficient than SOTA methods.
翻译:近年来,在知识传输方面广泛使用了深层转移学习。培训前和随后微调的标准方法,或线性测试的标准方法,已经表明在许多下游任务中行之有效。因此,一个具有挑战性和持续的问题产生:如何量化与转移的结果相兼容的跨任务转移性,同时保持自一致性?现有的可转让性指标是通过对源和目标任务进行调和来对特定模型进行估计的。当遇到新的未知目标任务时,必须对所有现有的源任务进行重新计算,这种任务在计算上极其昂贵。在这项工作中,我们强调哪些属性应该得到满足,并根据这些特点评估现有的指标。在此基础上,我们提议了 " 首席渐进期望 " (PGE),这是评估各项任务之间可转让性的一个简单而有效的方法。具体地说,我们使用重新启动的办法计算每个重量单位的每批次梯度,一次以上,然后用所有梯度的平均值来获得预期。因此,源和目标任务之间的可转移性是通过计算归正化本位梯度的距离来估计的。广泛的实验表明,拟议的可转让性比标准更稳定。