InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE.
翻译:以InfoNCE为基础的对比性代表制学习者,如SimCLR,近年来取得了巨大的成功。然而,这些对比性计划在资源上极为苛刻,因为小型培训(即log-K诅咒,而K是批量规模)导致效果崩溃。在这项工作中,我们从数学上揭示了在小批量制度下对比性学习者失败的原因,并提出了一个名为FlatNCE的新颖的简单、非三角的对比性目标,它解决了这一问题。与InfoNCE不同,我们的FlatNCE不再明确呼吁为对比性学习制定歧视性分类目标。理论上,我们展示FlatNCE是InfoNCE的数学双重配方,从而连接了典型能源模型的经典文献;从经验上看,我们证明,在对代码进行最低限度的修改后,FlatNCE能够直接促进业绩,而独立于主题工程努力。这项工作的意义由于对比性学习技术的有力普及以及采用新的工具来监测和诊断对比性培训。我们用FlatNCEMNCWA、Flafrod NCS、Flaformas、FRADNCS、Flafrods、FRADNCWA、FLDDDDRDR、FARDRDRDRDRA、ADRDUDRADRDDS、A、FARDSDSUDSDS、A、FARDFA、FAR、FAR、FAR、ADNCSDFAR、FAR、FARDNCSDFAR、FARDFARDFARDFARDFADSDSDSDSDSDSDSDSDSDSDSDFADFADFA、FADADFADSDSDSDSDSDSDFADFADFA和NC。