Recommender systems are usually developed and evaluated on the historical user-item logs. However, most offline recommendation datasets are highly sparse and contain various biases, which hampers the evaluation of recommendation policies. Existing efforts aim to improve the data quality by collecting users' preferences on randomly selected items (e.g., Yahoo! and Coat). However, they still suffer from the high variance issue caused by the sparsely observed data. To fundamentally solve the problem, we present KuaiRec, a fully-observed dataset collected from the social video-sharing mobile App, Kuaishou. The feedback of 1,411 users on almost all of the 3,327 videos is explicitly observed. To the best of our knowledge, this is the first real-world fully-observed dataset with millions of user-item interactions in recommendation. To demonstrate the advantage of KuaiRec, we leverage it to explore the key questions in evaluating conversational recommender systems. The experimental results show that two factors in traditional partially-observed data -- the data density and the exposure bias -- greatly affect the evaluation results. This entails the significance of our fully-observed data in researching many directions in recommender systems, e.g., the unbiased recommendation, interactive/conversational recommendation, and evaluation. We release the dataset and the pipeline implementation for evaluation at https://chongminggao.github.io/KuaiRec/.
翻译:建议系统通常是在历史用户项目日志上开发和评价的。然而,大多数离线建议数据集高度稀少,含有各种偏见,妨碍了对建议政策的评价。现有努力的目的是通过收集用户对随机选择项目(如Yahoo!和Coat)的偏好来提高数据质量。然而,它们仍然由于观测到的数据少而导致的高度差异问题而受到影响。为了从根本上解决问题,我们介绍了从社会视频共享移动应用程序KuaiRec(KuaiRec)中收集的完全可见的数据集。1,411个用户对几乎所有3,327视频的反馈被明确观察到。据我们所知,这是第一个与数百万用户项目互动的完全观测到的真实世界数据集。为了展示KuaiRec的优势,我们利用它来探讨评价谈话建议系统的关键问题。实验结果表明,传统半隐蔽数据的两个因素 -- -- 数据密度和暴露的偏差 -- -- 严重影响了评价结果。这需要我们完全观测到的数据在研究中的重要性,我们提出了许多数据系统的建议。