We show that Contrastive Learning (CL) under a broad family of loss functions (including InfoNCE) has a unified formulation of coordinate-wise optimization on the network parameter $\boldsymbol{\theta}$ and pairwise importance $\alpha$, where the \emph{max player} $\boldsymbol{\theta}$ learns representation for contrastiveness, and the \emph{min player} $\alpha$ puts more weights on pairs of distinct samples that share similar representations. The resulting formulation, called $\alpha$-CL, unifies not only various existing contrastive losses, which differ by how sample-pair importance $\alpha$ is constructed, but also is able to extrapolate to give novel contrastive losses beyond popular ones, opening a new avenue of contrastive loss design. These novel losses yield comparable (or better) performance on CIFAR10 and STL-10 than classic InfoNCE. Furthermore, we also analyze the max player in detail: we prove that with fixed $\alpha$, max player is equivalent to Principal Component Analysis (PCA) for deep linear network, and almost all local minima are global and rank-1, recovering optimal PCA solutions. Finally, we extend our analysis on max player to 2-layer ReLU networks, showing that its fixed points can have higher ranks.
翻译:我们显示,在一个广泛的损失函数大家庭(包括InfONCE)下,相对学习(CL)有一个统一的协调优化公式,它以网络参数$\boldsymbol_theta}$和双向重要性$\alpha$为单位,在网络参数$\boldsymbol_theta}$\boldsymbol_theta}下,美元可以学习对比性的表现,而eemphe{min}$\alpha$对具有类似代表性的不同样本的组合(包括InfoNCE)给予更多的权重。由此产生的配方称为$\alpha$-CL,不仅统一了现有的各种对比损失,这些损失因样本-pair重要性$_alpha$的构建不同而不同,而且还能够外推,给新的对比性损失带来超越流行性损失,打开了对比性损失设计的新途径。这些新的损失在CIFAR10和STL-10上的表现(或更好)比典型的信息NCEEE。此外,我们还可以详细分析最大玩家:我们用固定的$$1, 最大玩家相当于主部分分析系统,最后展示了我们最高级的磁级的磁级的磁级网络。