Masked image modeling (MIM) learns representations with remarkably good fine-tuning performances, overshadowing previous prevalent pre-training approaches such as image classification, instance contrastive learning, and image-text alignment. In this paper, we show that the inferior fine-tuning performance of these pre-training approaches can be significantly improved by a simple post-processing in the form of feature distillation (FD). The feature distillation converts the old representations to new representations that have a few desirable properties just like those representations produced by MIM. These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools. With these properties, the new representations show strong fine-tuning performance. Specifically, the contrastive self-supervised learning methods are made as competitive in fine-tuning as the state-of-the-art masked image modeling (MIM) algorithms. The CLIP models' fine-tuning performance is also significantly improved, with a CLIP ViT-L model reaching \textbf{89.0%} top-1 accuracy on ImageNet-1K classification. On the 3-billion-parameter SwinV2-G model, the fine-tuning accuracy on ADE20K semantic segmentation is improved by +1.5 mIoU to \textbf{61.4 mIoU}, creating a new record. More importantly, our work provides a way for the future research to focus more effort on the generality and scalability of the learnt representations without being pre-occupied with optimization friendliness since it can be enhanced rather easily. The code will be available at https://github.com/SwinTransformer/Feature-Distillation.
翻译:蒙面图像建模( MIM) 以非常优美的微调性能来学习旧的显示方式, 学习非常优美的显示方式。 这些特性, 我们统称为优化友好性, 被一组关注和优化相关诊断工具所识别和分析。 有了这些特性, 新的表示方式显示有很强的微调性能。 具体地说, 比较性自我超强的学习方法可以通过以特性蒸馏( FD) 的形式的简单后处理得到显著改进。 特性蒸馏将旧的表示方式转换为新的表示方式, 与 MIM 生成的表示方式相似。 特性蒸馏将旧的表示方式转换成像 MIM 生成的表示方式一样, 这些特性( 我们统称为优化友好性) 。 这些特性通过一套关注和优化相关诊断工具来识别和分析。 有了这些特性, 新的表示方式显示精度的微调性微调性能性能, 自SNet-1K 分类以来, 将更精确性能提升 SMADELIP 的微调性能性能性能, 将使得S- hillyalalal- falalalalalalalalation 。 在3- gredustrizalalation上, 通过S- madalation S- mdalation.