Recommender systems research tends to evaluate model performance offline and on randomly sampled targets, yet the same systems are later used to predict user behavior sequentially from a fixed point in time. Simulating online recommender system performance is notoriously difficult and the discrepancy between online and offline behaviors is typically not accounted for in offline evaluations. This disparity permits weaknesses to go unnoticed until the model is deployed in a production setting. In this paper, we first demonstrate how omitting temporal context when evaluating recommender system performance leads to false confidence. To overcome this, we postulate that offline evaluation protocols can only model real-life use-cases if they account for temporal context. Next, we propose a training procedure to further embed the temporal context in existing models: we introduce it in a multi-objective approach to traditionally time-unaware recommender systems and confirm its advantage via the proposed evaluation protocol. Finally, we validate that the Pareto Fronts obtained with the added objective dominate those produced by state-of-the-art models that are only optimized for accuracy on three real-world publicly available datasets. The results show that including our temporal objective can improve recall@20 by up to 20%.
翻译:推荐人系统研究倾向于对模型的脱线性能和随机抽样目标进行评价,但后来又使用同样的系统来从固定时间点连续预测用户行为。 模拟在线推荐人系统性能非常困难, 网上和离线行为之间的差异通常在离线评价中不计。 这种差异允许在模型在制作环境部署之前忽略弱点。 在本文件中, 我们首先演示在评价推荐人系统性能时如何忽略时间背景导致错误信心。 为了克服这一点, 我们假设离线评价协议只有在考虑到时间背景的情况下, 才能模拟实时使用案例。 接下来, 我们提议了一个培训程序, 以进一步将时间背景嵌入现有模型: 我们将其引入一个多目标方法, 以传统的时间软件性推荐人系统, 并通过拟议的评价协议确认其优势。 最后, 我们确认, 使用附加目标获得的Pareto Front 控制了由州- 艺术模型生成的模型, 这些模型只能优化三个现实世界公开数据集的准确性。 包括我们的时间性目标可以改进20到20 % 。