Selection bias is prevalent in the data for training and evaluating recommendation systems with explicit feedback. For example, users tend to rate items they like. However, when rating an item concerning a specific user, most of the recommendation algorithms tend to rely too much on his/her rating (feedback) history. This introduces implicit bias on the recommendation system, which is referred to as user feedback-loop bias in this paper. We propose a systematic and dynamic way to correct such bias and to obtain more diverse and objective recommendations by utilizing temporal rating information. Specifically, our method includes a deep-learning component to learn each user's dynamic rating history embedding for the estimation of the probability distribution of the items that the user rates sequentially. These estimated dynamic exposure probabilities are then used as propensity scores to train an inverse-propensity-scoring (IPS) rating predictor. We empirically validated the existence of such user feedback-loop bias in real world recommendation systems and compared the performance of our method with the baseline models that are either without de-biasing or with propensity scores estimated by other methods. The results show the superiority of our approach.
翻译:在培训和评价建议系统的数据中,选择偏差很普遍,有明确的反馈。例如,用户倾向于对喜欢的项目进行评分。然而,在对某个特定用户进行评分时,大多数建议算法往往过于依赖他/她的评分(回溯)历史。这在建议系统中引入了隐含的偏差,本文中称之为用户反馈-循环偏差。我们提出了一个系统化和动态的方法,用时间评分信息纠正这种偏差,并获得更加多样和客观的建议。具体地说,我们的方法包括一个深层次的学习组成部分,学习每个用户动态评分历史,嵌入对用户按顺序评分的项目的概率分布的估计。这些估计的动态暴露概率随后被用作惯性评分,用于培养反偏差(IPS)评级预测员。我们用经验验证了现实世界建议系统中存在这种用户反馈-loop偏差,并将我们方法的性能与基线模型进行比较,这些模型不是不偏差,就是用其他方法估计的偏差。结果显示了我们的方法的优越性。