Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. In the NLP task, we find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives. We discuss plausible explanations for this behavior and suggest future directions to better align theory and practice.
翻译:在这种方法中,通过减少监督学习获得一种代表制,考虑到语义相似的概念,学习者试图将类似(正面)实例与随机(消极)实例的收集区分开来。现代对比学习管道的成功取决于许多参数,如数据扩增选择、负面实例的数量和批量大小;然而,对这些参数如何相互作用和影响下游绩效的理解有限。我们侧重于模糊这些参数之一的作用:负面实例的数量。理论上,我们显示存在碰撞覆盖交易,表明负面实例的最佳数量应当与数据中基本概念的数量相比。我们随机地审视负面实例数量在国家实验室和愿景任务中的作用。在国家实验室任务中,我们发现这些结果与我们的理论基本一致,而我们的愿景实验有时甚至与负面实例的数量不相适应。我们讨论关于这一行为和未来方向的合理解释,以更好地调整这一行为和理论。