Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more "negative samples" in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a "collision-coverage" trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.
翻译:最近对噪音对比性估计的调查在经验上和理论上都表明,在对比性损失中有更多的“负面样本”在理论上和理论上都表明,虽然在最初的临界值之外,在对比性损失中提高了下游分类的性能,但是由于“阴性覆盖”的权衡,下游的性能受到损害。但是,这种现象是对比性学习所固有的吗?我们在简单的理论环境中显示,正对是通过潜在类别(由Saunshi等人介绍的(ICML 2019年))的采样产生的,因此,表层的下游性能在实际中优化(人口)对比性损失并不与负性样品的数量相比下降。与此同时,我们从结构上描述我们框架中的最佳代表性,以噪声对比性估计。我们还从经验上支持我们关于CIFAR-10和CIFAR-100数据集的理论结果。