Contrastive learning relies on an assumption that positive pairs contain related views, e.g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance. But what if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we propose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to existing contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning benchmarks that exhibit a variety of real-world noise patterns.
翻译:反向学习所依据的假设是,正对等包含相关观点,例如图像的补丁或同时出现的视频多式信号,这些信号分享了某个实例的某些基本信息。但如果这一假设被违反,则会如何?文献表明,对比学习在噪音观点面前产生不最优的表现,例如,假正对,没有明显的共享信息。在这项工作中,我们提出了一个新的对比性损失功能,这种功能对噪音观点是强有力的。我们提供了严格的理论理由,通过显示与强烈对称损失的关联,为吵闹的二进制分类提供了严格的理论依据,并根据瓦塞斯坦远程测量标准为相互信息最大化建立了新的对比性约束。提议的损失是完全模式的,是对InfoNCE损失的简单投放入替代,因此很容易适用于现有的对比性框架。我们表明,我们的方法在图像、视频和图表对比性学习基准方面与展示了各种真实世界噪音模式的状态相比,提供了一致的改进。