Contrastive unsupervised representation learning (CURL) encourages data representation to make semantically similar pairs closer than randomly drawn negative samples, which has been successful in various domains such as vision, language, and graphs. Although recent theoretical studies have attempted to explain its success by upper bounds of a downstream classification loss by the contrastive loss, they are still not sharp enough to explain an experimental fact: larger negative samples improve the classification performance. This study establishes a downstream classification loss bound with a tight intercept in the negative sample size. By regarding the contrastive loss as a downstream loss estimator, our theory not only improves the existing learning bounds substantially but also explains why downstream classification empirically improves with larger negative samples -- because the estimation variance of the downstream loss decays with larger negative samples. We verify that our theory is consistent with experiments on synthetic, vision, and language datasets.
翻译:虽然最近的理论研究试图通过下游分类损失的上限(对比性损失)来解释其成功与否,但是这些理论研究仍然不够尖锐,不足以解释实验性事实:较大的负面抽样提高了分类性能。本研究确定了下游分类损失,并紧紧地拦截了负样尺寸。关于作为下游损失估计仪的对比性损失,我们的理论不仅大大改进了现有的学习界限,而且还解释了为什么下游分类以较大的负样进行实验性改进的原因 -- -- 因为下游损失估计值的差异随着较大的负样的腐蚀而腐蚀。我们核实我们的理论与合成、视觉和语言数据集的实验是一致的。