Generally speaking, the model training for recommender systems can be based on two types of data, namely explicit feedback and implicit feedback. Moreover, because of its general availability, we see wide adoption of implicit feedback data, such as click signal. There are mainly two challenges for the application of implicit feedback. First, implicit data just includes positive feedback. Therefore, we are not sure whether the non-interacted items are really negative or positive but not displayed to the corresponding user. Moreover, the relevance of rare items is usually underestimated since much fewer positive feedback of rare items is collected compared with popular ones. To tackle such difficulties, both pointwise and pairwise solutions are proposed before for unbiased relevance learning. As pairwise learning suits well for the ranking tasks, the previously proposed unbiased pairwise learning algorithm already achieves state-of-the-art performance. Nonetheless, the existing unbiased pairwise learning method suffers from high variance. To get satisfactory performance, non-negative estimator is utilized for practical variance control but introduces additional bias. In this work, we propose an unbiased pairwise learning method, named UPL, with much lower variance to learn a truly unbiased recommender model. Extensive offline experiments on real world datasets and online A/B testing demonstrate the superior performance of our proposed method.
翻译:推荐系统的模型训练一般基于显式反馈和隐式反馈两种数据。由于隐式反馈数据普遍存在,如点击信号,因此得到了广泛应用。然而,隐式反馈应用面临两个主要挑战。首先,隐式数据只包含正反馈,因此无法确定非交互项是真正的负面还是未展示给用户的正面反馈。此外,由于罕见商品的正面反馈远远少于热门商品,因此通常低估罕见商品的相关性。为解决这些困难,先前已提出点对和配对解决方案以实现无偏相关性学习。因为配对学习非常适合排序任务,先前提出的无偏配对学习算法已经实现了最先进的性能。然而,现有的无偏配对学习方法面临着较高的方差问题。为获得令人满意的性能,实际方差控制采用了非负估计器,但会引入额外的偏差。在这项工作中,我们提出了一种名为UPL的无偏配对学习方法,具有更低的方差,以学习真正无偏的推荐模型。对真实世界数据集的广泛离线实验和在线A / B测试表明了我们提出的方法的优越性能。