Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks, such as MobileNet and EfficientNet. A common practice to address this problem is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher. However, it is time and resource consuming to pretrain a teacher model when it is not available. In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model. Specifically, we show that the optimal recipe for efficient models is different from that of larger models, and using the same training settings as ResNet50, as previous research does, is inappropriate. Additionally, we observe a common issu e in contrastive learning where either the positive or negative views can be noisy, and propose a smoothed version of InfoNCE loss to alleviate this problem. As a result, we successfully improve the linear evaluation results from 36.3\% to 62.3\% for MobileNet-V3-Large and from 42.2\% to 65.8\% for EfficientNet-B0 on ImageNet, closing the accuracy gap to ResNet50 with $5\times$ fewer parameters. We hope our research will facilitate the usage of lightweight contrastive models.
翻译:最近的研究表明,在自我监督的高效网络(如移动网络和高效网络)的对比性学习中,自我监督的对比性学习出现性能下降。 解决这一问题的一个常见做法是引入一个经过预先训练的对比式教师模型,用教师产生的蒸馏信号对轻量网络进行培训。然而,在教师模式不具备的情况下,对教师模式进行预先培训需要花费时间和资源。在这项工作中,我们的目标是为轻量对比型模型建立一个更强大的基线,而不使用预先培训的教师模型。具体地说,我们表明,高效模型的最佳配方不同于大型模型,并且使用与ResNet50一样的培训环境是不合适的。此外,我们观察到一种常见的对比式学习模式是,无论是正面的还是负面的观点都可能吵闹,并提出了一种平滑的InfoNCE损失版本,以缓解这一问题。因此,我们成功地改进了对MobNet-V3-Large的线性评价结果,从42. 2 ⁇ 到65.8 ⁇ 高效的网络-B0,就像以前的研究那样,我们用图像网上的精确度差距将缩小到ResNet50的光量值模型。