大型建议系统中受贬低候选人产生偏向性影响的候选人的反向学习 (Contrastive Learning for Debiased Candidate Generation in Large-Scale Recommender Systems)

Deep candidate generation (DCG) that narrows down the collection of relevant items from billions to hundreds via representation learning has become prevalent in industrial recommender systems. Standard approaches approximate maximum likelihood estimation (MLE) through sampling for better scalability and address the problem of DCG in a way similar to language modeling. However, live recommender systems face severe exposure bias and have a vocabulary several orders of magnitude larger than that of natural language, implying that MLE will preserve and even exacerbate the exposure bias in the long run in order to faithfully fit the observed samples. In this paper, we theoretically prove that a popular choice of contrastive loss is equivalent to reducing the exposure bias via inverse propensity weighting, which provides a new perspective for understanding the effectiveness of contrastive learning. Based on the theoretical discovery, we design CLRec, a contrastive learning method to improve DCG in terms of fairness, effectiveness and efficiency in recommender systems with extremely large candidate size. We further improve upon CLRec and propose Multi-CLRec, for accurate multi-intention aware bias reduction. Our methods have been successfully deployed in Taobao, where at least four-month online A/B tests and offline analyses demonstrate its substantial improvements, including a dramatic reduction in the Matthew effect.

翻译：标准方法通过取样估计最大可能性,以便更便于缩放,并以类似语言模型的方式解决DCG问题;然而,现场推荐系统面临严重的接触偏差,其词汇量比自然语言大得多,意味着MLE将保持甚至加剧长期的接触偏差,以便忠实地符合观察到的样本;在本文中,我们理论上证明,通过反偏向权重,大众选择对比损失相当于减少接触偏差,这为理解对比学习的有效性提供了新的视角;根据理论发现,我们设计CLRec,一种对比式学习方法,在公平性、有效性和效率方面改进DCG,其候选规模极大的建议系统。我们进一步改进CLRec,并提议多氯Rec,以准确的多度感知偏差减少。我们在Taubao成功采用了方法,至少展示了4个月的在线A/B号测试和离线分析的大幅减少效果。