In this paper, we propose a method, named EqCo (Equivalent Rules for Contrastive Learning), to make self-supervised learning irrelevant to the number of negative samples in the contrastive learning framework. Inspired by the InfoMax principle, we point that the margin term in contrastive loss needs to be adaptively scaled according to the number of negative pairs in order to keep steady mutual information bound and gradient magnitude. EqCo bridges the performance gap among a wide range of negative sample sizes, so that we can use only a few negative pairs (e.g. 16 per query) to perform self-supervised contrastive training on large-scale vision datasets like ImageNet, while with almost no accuracy drop. This is quite a contrast to the widely used large batch training or memory bank mechanism in current practices. Equipped with EqCo, our simplified MoCo (SiMo) achieves comparable accuracy with MoCo v2 on ImageNet (linear evaluation protocol) while only involves 16 negative pairs per query instead of 65536, suggesting that large quantities of negative samples might not be a critical factor in contrastive learning frameworks.
翻译:在本文中,我们提出了一个名为EqCo(对比学习等效规则)的方法,使自我监督的学习与对比学习框架中的负面样本数量无关。在InfoMax原则的启发下,我们指出,对比性损失的差值术语需要根据负对数进行适应性调整,以保持稳定的相互信息约束和梯度。EqCo在各种负抽样大小之间缩小性能差距,以便我们只能使用少数负对数(例如每查询16对)来进行像图像网络这样的大型视觉数据集自我监督的对比性培训,而几乎没有精确性下降。这与目前做法中广泛使用的大批量培训或记忆库机制形成鲜明的对比。与EqCo相比,我们简化的Moco(SiMo)在图像网络(线性评价协议)上实现了与MoCo v2的相似的准确性能,而每次查询只涉及16对负对,而不是65536,表明大量负抽样可能不是对比性学习框架中的一个关键因素。