Contrastive visual pretraining based on the instance discrimination pretext task has made significant progress. Notably, recent work on unsupervised pretraining has shown to surpass the supervised counterpart for finetuning downstream applications such as object detection and segmentation. It comes as a surprise that image annotations would be better left unused for transfer learning. In this work, we investigate the following problems: What makes instance discrimination pretraining good for transfer learning? What knowledge is actually learned and transferred from these models? From this understanding of instance discrimination, how can we better exploit human annotation labels for pretraining? Our findings are threefold. First, what truly matters for the transfer is low-level and mid-level representations, not high-level representations. Second, the intra-category invariance enforced by the traditional supervised model weakens transferability by increasing task misalignment. Finally, supervised pretraining can be strengthened by following an exemplar-based approach without explicit constraints among the instances within the same category.
翻译:值得注意的是,最近的未经监督的训练前工作已经超过了监督的对应方,对下游应用进行微调,如物体探测和分割等。令人惊讶的是,图像说明最好不用于转移学习。在这项工作中,我们调查了以下问题:什么使歧视前的训练有利于转移学习?从这些模式中实际学到和转让了什么知识?从实例歧视的这种理解中,我们如何更好地利用人类说明标签进行训练前的训练?我们的调查结果有三重。首先,转让的真正问题在于低层次和中级代表,而不是高级别代表。第二,由传统受监督模式执行的类别内差异通过增加任务不匹配而削弱可转移性。最后,通过采用以实例为基础的方法,没有同一类别中的明确限制,可以加强受监督的训练前工作。