We consider the question: how can you sample good negative examples for contrastive learning? We argue that, as with metric learning, learning contrastive representations benefits from hard negative samples (i.e., points that are difficult to distinguish from an anchor point). The key challenge toward using hard negatives is that contrastive methods must remain unsupervised, making it infeasible to adopt existing negative sampling strategies that use label information. In response, we develop a new class of unsupervised methods for selecting hard negative samples where the user can control the amount of hardness. A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible. The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.
翻译:我们思考了这样一个问题:如何为对比性学习采样良好的负面实例?我们争论说,与衡量学习一样,学习对比表象从硬性负抽样中受益(即难以区分临界点的点 ) 。 使用硬性负点的关键挑战在于,对比方法必须保持不受监督,使得采用现有的使用标签信息的现有负面抽样战略变得不可行。 作为回应,我们开发了一种新的不受监督的筛选硬性负抽样方法,用户可以在其中控制硬性强度。这种抽样的有限性案例导致每个类别都严格分组,并尽可能将不同类别推开。 拟议的方法可以改善多种模式的下游性能,只需要几条额外的代码来实施,并且不引入计算性间接费用。