Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications. Traditional models usually motivate themselves by designing complex or tailored architectures based on different assumptions. However, the training data of recommender system can be extremely sparse and imbalanced, which poses great challenges for boosting the recommendation performance. To alleviate this problem, in this paper, we propose to reformulate the recommendation task within the causal inference framework, which enables us to counterfactually simulate user ranking-based preferences to handle the data scarce problem. The core of our model lies in the counterfactual question: "what would be the user's decision if the recommended items had been different?". To answer this question, we firstly formulate the recommendation process with a series of structural equation models (SEMs), whose parameters are optimized based on the observed data. Then, we actively indicate many recommendation lists (called intervention in the causal inference terminology) which are not recorded in the dataset, and simulate user feedback according to the learned SEMs for generating new training samples. Instead of randomly intervening on the recommendation list, we design a learning-based method to discover more informative training samples. Considering that the learned SEMs can be not perfect, we, at last, theoretically analyze the relation between the number of generated samples and the model prediction error, based on which a heuristic method is designed to control the negative effect brought by the prediction error. Extensive experiments are conducted based on both synthetic and real-world datasets to demonstrate the effectiveness of our framework.
翻译:旨在学习基于用户排名的偏好的Top-N建议长期以来一直是范围广泛的应用中一个根本问题。传统模型通常通过设计基于不同假设的复杂或定制结构来激励自己。然而,推荐者系统的培训数据可能极为稀少和不平衡,对提高建议性能提出了巨大的挑战。为了缓解这一问题,我们在本文件中提议在因果推理框架内重新拟订建议任务,从而使我们能够反证模拟基于用户排名的偏好,以处理数据稀缺的问题。我们模型的核心在于反事实问题:“如果推荐的项目不同,用户的决定是什么?”为了回答这个问题,我们首先用一系列结构等式模型(SEMs)来制定建议程序,其参数根据观察到的数据优化。然后,我们积极提出许多建议清单(所谓“因果推断术语干预”),这些清单没有记录在数据集中,并且根据所学的SEM框架模拟用户反馈,以生成新的培训样本。在建议列表上随机地进行干涉,我们设计了一个建议性的分析进程进程进程,我们用一系列结构等式模型来优化其参数,然后我们通过在所观察到的精确的模型中学习的方法来发现我们所制作到的精确性数据。我们所学到的模型的精确的样本,然后根据最后的样本来分析方法来发现。我们所学到的模型,我们所学到的精确的样本,我们所学到的模型可以发现的样本。