In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).
翻译:在本文中,我们从优化角度研究对比式学习,目的是分析和解决现有对比式学习方法的根本问题,这些方法依赖大批量或大字典的特性矢量。我们考虑一个全球目标,即对比学习,将每个正对与所有负对对比,以达到一个锚点。从优化角度,我们解释为什么现有方法,例如SimCLR需要大批量规模才能取得令人满意的结果。为了消除这种要求,我们建议采用一个记忆高效的斯托克优化算法,用以解决现有对比式学习方法的根本问题,即所谓的SogCLR。我们还试图表明,在经过足够多的迭代后,其优化误差在合理条件下是微不足道的,或者为稍为不同的全球对比性目标正在减少。从优化角度看,我们证明现有方法,例如SimCLRR和小批量规模(例如,256)可以达到与大批量的SimCLRRR(例如,8192)相似的业绩,用于在图像网-1K上自我超超的学习任务。我们还试图显示,拟议的优化技术是通用的平流路路路路路路路路路对比。我们学习其他方法可以应用的对比。学习其他方法。学习损失。