In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning. Despite the success and adoption of representations learned through this paradigm, much is yet to be understood about how different training methods and datasets influence performance on downstream tasks. In this paper, we analyze contrastive approaches as one of the most successful and popular variants of self-supervised representation learning. We perform this analysis from the perspective of the training algorithms, pre-training datasets and end tasks. We examine over 700 training experiments including 30 encoders, 4 pre-training datasets and 20 diverse downstream tasks. Our experiments address various questions regarding the performance of self-supervised models compared to their supervised counterparts, current benchmarks used for evaluation, and the effect of the pre-training data on end task performance. We hope the insights and empirical evidence provided by this work will help future research in learning better visual representations.
翻译:在过去几年里,我们在自我监督的代表学习方面取得了显著的突破。尽管通过这一模式成功地和采纳了各种表述方法,但在不同的培训方法和数据集如何影响下游任务的业绩方面还有待于理解。在本文件中,我们分析自监督的代表学习的最成功和最受欢迎的变体之一,我们从培训算法、培训前数据集和最终任务的角度进行这一分析。我们检查了700多个培训实验,其中包括30个编码器、4个培训前数据集和20个不同的下游任务。我们的实验涉及与受监督的对应方相比,自监督模型的性能、目前用于评估的基准以及培训前数据对最终任务业绩的影响等各种问题。我们希望这项工作所提供的见解和经验证据将有助于今后研究如何更好地进行视觉介绍。