Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.
翻译:近些年来,不同领域的对比学习应用取得了许多成功,但自我监督的版本仍然是许多令人兴奋的挑战。由于负面样本来自未贴标签的数据集,随机选择的样本实际上对锚来说可能是虚假的,导致编码器培训不正确。本文提出一种新的自我监督的对比损失,称为BCL损失,该损失仍然使用未贴标签数据中的随机样本,同时用重量重纠正由此产生的偏差。关键的想法是设计在巴伊西亚框架下对硬性真实的负面样本进行取样所需的抽样分布。显著优势在于,理想的抽样分布是一个参数结构,其位置参数是分别对开采硬性负值的虚假负值和浓度参数进行贬分。实验证实了BCL损失的有效性和优越性。