The practical aspects of evaluating recommender systems is an actively discussed topic in the research community. While many current evaluation techniques bring performance down to a single-value metric as a straightforward approach for model comparison, it is based on a strong assumption of the methods' stable performance over time. In this paper, we argue that leaving out a method's continuous performance can lead to losing valuable insight into joint data-method effects. We propose the Cross-Validation Thought Time (CVTT) technique to perform more detailed evaluations, which focus on model cross-validation performance over time. Using the proposed technique, we conduct a detailed analysis of popular RecSys algorithms' performance against various metrics and datasets. We also compare several data preparation and evaluation strategies to analyze their impact on model performance. Our results show that model performance can vary significantly over time, and both data and evaluation setup can have a marked effect on it.
翻译:评价推荐人系统的实际方面是研究界积极讨论的一个专题。虽然许多目前的评价技术使业绩降低到单一价值指标,作为进行模型比较的简单方法,但它是基于对方法长期稳定业绩的有力假设。在本文中,我们争辩说,放弃方法的持续业绩可能导致对联合数据方法效应失去宝贵的洞察力。我们建议交叉估价思考时间(CVTT)技术来进行更详细的评价,重点是模型交叉验证业绩。我们利用拟议的技术,对流行的RecSys算法与各种指标和数据集的性能进行详细分析。我们还比较了若干数据编制和评价战略,以分析其对模型业绩的影响。我们的结果显示,模型业绩随着时间的推移可能大不相同,而数据和评价结构也会对其产生显著的影响。