Generally speaking, the model training for recommender systems can be based on two types of data, namely explicit feedback and implicit feedback. Moreover, because of its general availability, we see wide adoption of implicit feedback data, such as click signal. There are mainly two challenges for the application of implicit feedback. First, implicit data just includes positive feedback. Therefore, we are not sure whether the non-interacted items are really negative or positive but not displayed to the corresponding user. Moreover, the relevance of rare items is usually underestimated since much fewer positive feedback of rare items is collected compared with popular ones. To tackle such difficulties, both pointwise and pairwise solutions are proposed before for unbiased relevance learning. As pairwise learning suits well for the ranking tasks, the previously proposed unbiased pairwise learning algorithm already achieves state-of-the-art performance. Nonetheless, the existing unbiased pairwise learning method suffers from high variance. To get satisfactory performance, non-negative estimator is utilized for practical variance control but introduces additional bias. In this work, we propose an unbiased pairwise learning method, named UPL, with much lower variance to learn a truly unbiased recommender model. Extensive offline experiments on real world datasets and online A/B testing demonstrate the superior performance of our proposed method.
翻译:通常情况下,推荐系统的模型训练可基于显式反馈和隐式反馈两种类型的数据。此外,由于其广泛的可用性,我们看到隐式反馈数据的广泛应用,如点击信号。应用隐式反馈存在两个主要挑战。首先,隐式数据仅包含正反馈。因此,我们不确定未与之互动的项目确实是负面的,还是仅是未显示给相应用户的正面反馈。此外,与受欢迎的项目相比,罕见项目的相关性通常被低估,因为罕见项目的正反馈收集要少得多。为解决这些困难,此前提出了点对和点级解决方案用于无偏相关性学习。因为点对学习适用于排名任务,因此以前提出的无偏点对学习算法已经达到了最先进的表现。尽管如此,现有的无偏点对学习方法存在高方差的问题。为了得到令人满意的性能,实际方差控制采用非负估算器,但会引入额外的偏差。在本研究中,我们提出了一种无偏的点对学习方法,命名为UPL,具有更低的方差,可以学习到真正无偏的推荐模型。广泛的真实数据集离线实验和在线A / B测试证明了我们提出的方法的卓越性能。