关于建议系统离线评价中数据泄漏的重要研究 (A Critical Study on Data Leakage in Recommender System Offline Evaluation)

In academic research, recommender models are often evaluated on offline datasets. The offline dataset is first split to training and test instances. All training instances are then modelled in a user-item interaction matrix which can be used to train recommender models. Many such offline evaluations ignore the global timeline in the data, which leads to "data leakage": a model learns from future data to predict a current value, making the evaluation unrealistic. In this paper, we evaluate the impact of "data leakage" using two widely adopted baseline models, BPR and NeuMF, on four popular offline datasets - MovieLens-25M, Yelp, Amazon-music, and Amazon-electronic. We show that accessing to different amount of future data may improve or deteriorate a model's recommendation accuracy. That is, ignoring global timeline in offline evaluation makes the performance among recommendation models not comparable. We share our understanding of these observations and highlight the importance of preserving the global timeline. We also call for a revisit of recommender system offline evaluation.

翻译：在学术研究中,推荐人模型往往在离线数据集中进行评估。离线数据集首先分为培训和测试实例。然后,所有培训实例都以用户-项目互动矩阵为模型模型,可用于培训推荐人模型。许多此类离线评价忽略了数据中的全球时间表,导致“数据泄漏”:一个模型从未来数据中学习,以预测当前价值,使评价不现实。在本文件中,我们使用两个广泛采用的基准模型,即BPR和NeuMF, 来评估“数据泄漏”的影响,这四个离线通用数据集,即MoviceLens-25M、Yelp、Amazon-音乐和亚马孙-电子。我们表示,获取不同数量的未来数据可能改进或恶化模式建议准确性。这就是,忽略离线评价的全球时间表,使得建议模型的绩效无法比较。我们分享我们对这些意见的理解,并强调保护全球时间表的重要性。我们还呼吁重新审视推荐人系统离线评价。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【KDD2020-Tutorial】自动推荐系统，Automated Recommendation System

专知会员服务

52+阅读 · 2020年8月25日