Contrastive-based self-supervised learning methods achieved great success in recent years. However, self-supervision requires extremely long training epochs (e.g., 800 epochs for MoCo v3) to achieve promising results, which is unacceptable for the general academic community and hinders the development of this topic. This work revisits the momentum-based contrastive learning frameworks and identifies the inefficiency in which two augmented views generate only one positive pair. We propose Fast-MoCo - a novel framework that utilizes combinatorial patches to construct multiple positive pairs from two augmented views, which provides abundant supervision signals that bring significant acceleration with neglectable extra computational cost. Fast-MoCo trained with 100 epochs achieves 73.5% linear evaluation accuracy, similar to MoCo v3 (ResNet-50 backbone) trained with 800 epochs. Extra training (200 epochs) further improves the result to 75.1%, which is on par with state-of-the-art methods. Experiments on several downstream tasks also confirm the effectiveness of Fast-MoCo.
翻译:然而,自我监督的自我监督式学习方法近年来取得了巨大成功。 但是,自我监督式的学习方法需要非常长的培训时代(例如,MOCO 3,800个时代)才能取得有希望的成果,这对于整个学术界来说是不可接受的,并且阻碍了这一专题的发展。这项工作重新审视了基于动力的对比学习框架,并确定了两种增强的视觉只产生一对正面的低效率。我们提出了快速模式 — — 一个新颖的框架,利用组合式补丁从两种扩大的观点中构建多对正对,这提供了大量的监督信号,从而大大加速了可忽略的额外计算成本。以100个时代培训的快速模式实现了73.5%的线性评价精度,类似于MOCo v3 (Res-50骨干) 以800个时代培训。额外培训(200个时代)进一步将结果提高到75.1%,这与最新方法相当。关于一些下游任务的实验还证实了快速模式的有效性。