Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm, which learns representations by pushing positive pairs, or similar examples from the same class, closer together while keeping negative pairs far apart. Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i.e., data augmentations of the same image). Our work analyzes contrastive learning without assuming conditional independence of positive pairs using a novel concept of the augmentation graph on data. Edges in this graph connect augmentations of the same data, and ground-truth classes naturally form connected sub-graphs. We propose a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective on neural net representations. Minimizing this objective leads to features with provable accuracy guarantees under linear probe evaluation. By standard generalization bounds, these accuracy guarantees also hold when minimizing the training contrastive loss. Empirically, the features learned by our objective can match or outperform several strong baselines on benchmark vision datasets. In all, this work provides the first provable analysis for contrastive learning where guarantees for linear probe evaluation can apply to realistic empirical settings.
翻译:最近自我监督的学习工作依靠对比式学习模式,通过推推正对或同一班级的类似例子,更紧密地结合在一起,同时保持负对的对比,从而了解了表情,从而更紧密地结合在一起,同时保持负对等。尽管取得了一些成功,但理论基础是有限的 -- -- 先前的分析假设,根据同一类标签,正对的假设是有条件的,但最近的经验应用使用了非常相近的正对(即同一图像的数据增强)。我们的工作分析对比式学习,而没有假设正对对的有条件独立,使用数据增量图表的新概念。这个图中的偏差将同一数据的增量和地面对地面对流的自然形态相连接起来。我们提出了一种损失,即对人口增量图进行光谱分解,并且可以简洁地写成神经网显示的对比性学习目标。尽可能缩小这一目标导致直线探测评估中具有可辨的准确性保证。根据标准的概括性界限,这些精确性保证在尽量减少培训的强烈对比性损失时也能维持住。我们客观基线分析所学的特征可以匹配或直线性评估。