Transfer learning is the predominant paradigm for training deep networks on small target datasets. Models are typically pretrained on large ``upstream'' datasets for classification, as such labels are easy to collect, and then finetuned on ``downstream'' tasks such as action localisation, which are smaller due to their finer-grained annotations. In this paper, we question this approach, and propose co-finetuning -- simultaneously training a single model on multiple ``upstream'' and ``downstream'' tasks. We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data, and also show how we can easily extend our approach to multiple ``upstream'' datasets to further improve performance. In particular, co-finetuning significantly improves the performance on rare classes in our downstream task, as it has a regularising effect, and enables the network to learn feature representations that transfer between different datasets. Finally, we observe how co-finetuning with public, video classification datasets, we are able to achieve state-of-the-art results for spatio-temporal action localisation on the challenging AVA and AVA-Kinetics datasets, outperforming recent works which develop intricate models.
翻译:转移学习是培训小型目标数据集深层次网络的主要模式。模型通常在大型“上流”数据集进行分类培训,因为这类标签易于收集,然后对“下游”任务(如行动本地化)进行微调,因为其细微的附加说明较小。在本文中,我们质疑这一方法,并提议共同调整——同时培训关于“上流”和“下游”多个数据集的单一模型。我们证明,在使用同样数量的数据时,共同调整超越传统转移学习,还表明我们如何轻易扩展我们的方法,将“上流”多个数据集推广到进一步提高性能。特别是,共同调整大大改善了我们下游任务中稀有班级的性能,因为它具有常规化效果,并使网络能够学习在不同数据集之间传输的特征表现。最后,我们观察如何与公共、视频分类数据集进行联合调整,在使用同样数量的数据时,我们如何能够轻易地扩大我们的方法,将我们的方法扩大到多个“上流”数据集,以进一步改进绩效。特别是,共同调整可大大改进我们下游任务中稀有的“上”模型,因为其具有常规模型,从而可以发展出一个具有挑战性的模型。