Many video-on-demand and music streaming services provide the user with a page consisting of several recommendation lists, i.e. widgets or swipeable carousels, each built with a specific criterion (e.g. most recent, TV series, etc.). Finding efficient strategies to select which carousels to display is an active research topic of great industrial interest. In this setting, the overall quality of the recommendations of a new algorithm cannot be assessed by measuring solely its individual recommendation quality. Rather, it should be evaluated in a context where other recommendation lists are already available, to account for how they complement each other. This is not considered by traditional offline evaluation protocols. Hence, we propose an offline evaluation protocol for a carousel setting in which the recommendation quality of a model is measured by how much it improves upon that of an already available set of carousels. We report experiments on publicly available datasets on the movie domain and notice that under a carousel setting the ranking of the algorithms change. In particular, when a SLIM carousel is available, matrix factorization models tend to be preferred, while item-based models are penalized. We also propose to extend ranking metrics to the two-dimensional carousel layout in order to account for a known position bias, i.e. users will not explore the lists sequentially, but rather concentrate on the top-left corner of the screen.
翻译:许多按需播放的视频和音乐流服务为用户提供了一个页面,由若干建议列表组成,即部件或可翻转的旋转木马,每张都是以特定标准(例如最近的电视系列等)建造的。找到有效的战略,选择哪些旋转木马可以展示是一个非常有产业兴趣的积极研究主题。在这一背景下,新算法建议的总体质量无法通过仅仅衡量其个别建议质量来评估。相反,它应该在已经具备其他建议列表的情况下加以评估,以说明它们是如何相互补充的。传统离线评价协议不考虑这一点。因此,我们提议为旋转木马设置一个离线评价协议,其中对模型的建议质量的衡量以它比一套已有的旋转木马的更好程度来衡量。我们报告关于可公开提供的电影领域数据集的实验,并且指出,在确定算法变化等级的滑鼠之下,矩阵化模型往往更可取,而在传统的离线评价协议中,而基于物品的模型则以排序为准。我们还提议,将硬盘模型扩展为上层的顺序排列。