The learn-to-compare paradigm of contrastive representation learning (CRL), which compares positive samples with negative ones for representation learning, has achieved great success in a wide range of domains, including natural language processing, computer vision, information retrieval and graph learning. While many research works focus on data augmentations, nonlinear transformations or other certain parts of CRL, the importance of negative sample selection is usually overlooked in literature. In this paper, we provide a systematic review of negative sampling (NS) techniques and discuss how they contribute to the success of CRL. As the core part of this paper, we summarize the existing NS methods into four categories with pros and cons in each genre, and further conclude with several open research questions as future directions. By generalizing and aligning the fundamental NS ideas across multiple domains, we hope this survey can accelerate cross-domain knowledge sharing and motivate future researches for better CRL.
翻译:对比性代表性学习(CRL)的学习模式(CRL)将正面样本与负面样本进行比较,用于代表性学习,在广泛领域取得了巨大成功,包括自然语言处理、计算机视觉、信息检索和图表学习,许多研究工作的重点是数据增强、非线性转换或CRL的其他某些部分,文献中通常忽视负面样本选择的重要性。在本文件中,我们系统地审查负面抽样技术,并讨论这些技术如何为CRL的成功作出贡献。作为本文件的核心部分,我们将现有的NS方法归纳为四种类别,每种类型都有利弊,并进一步以若干开放式研究问题作为今后的方向。我们希望,通过在多个领域推广和统一NS的基本想法,这项调查能够加快横向知识共享,推动未来研究,以便改进CRL。