How to sample high quality negative instances from unlabeled data, i.e., negative sampling, is important for training implicit collaborative filtering and contrastive learning models. Although previous studies have proposed some approaches to sample informative instances, few has been done to discriminating false negative from true negative for unbiased negative sampling. On the basis of our order relation analysis of negatives' scores, we first derive the class conditional density of true negatives and that of false negatives. We next design a Bayesian classifier for negative classification, from which we define a model-agnostic posterior probability estimate of an instance being true negative as a quantitative negative signal measure. We also propose a Bayesian optimal sampling rule to sample high-quality negatives. The proposed Bayesian Negative Sampling (BNS) algorithm has a linear time complexity. Experimental studies validate the superiority of BNS over the peers in terms of better sampling quality and better recommendation performance.
翻译:如何从未贴标签的数据(即负抽样)中抽取高质量的负面实例,对于培训隐性协作过滤和对比式学习模式十分重要。虽然以前的研究已经提出一些抽样信息实例的方法,但很少采取区分虚假负面实例和真实负面负面抽样的方法。根据我们对底片分数的顺序关系分析,我们首先得出真实底片和假底片的等级条件密度。我们接下来设计一个负面分类的贝叶斯分类器,我们从中确定一个模型 -- -- 不可知性的事后概率估计值,作为定量负信号计量。我们还建议采用巴耶斯最佳抽样规则来抽样高质量负数。拟议的巴耶斯负抽样算法具有线性时间复杂性。实验研究证实,从更好的取样质量和更好的建议性表现来看,BNS优于同行。