Recent research has suggested different metrics to measure the inconsistency of recommendation performance, including the accuracy difference between user groups, miscalibration, and popularity lift. However, a study that relates miscalibration and popularity lift to recommendation accuracy across different user groups is still missing. Additionally, it is unclear if particular genres contribute to the emergence of inconsistency in recommendation performance across user groups. In this paper, we present an analysis of these three aspects of five well-known recommendation algorithms for user groups that differ in their preference for popular content. Additionally, we study how different genres affect the inconsistency of recommendation performance, and how this is aligned with the popularity of the genres. Using data from LastFm, MovieLens, and MyAnimeList, we present two key findings. First, we find that users with little interest in popular content receive the worst recommendation accuracy, and that this is aligned with miscalibration and popularity lift. Second, our experiments show that particular genres contribute to a different extent to the inconsistency of recommendation performance, especially in terms of miscalibration in the case of the MyAnimeList dataset.
翻译:最近的研究提出了衡量建议性能不一致的不同指标,包括用户群体之间的准确性差异、校准错误和普及程度提升。然而,关于不同用户群体之间建议性能不一和普及程度提高的研究仍然缺乏。此外,还不清楚特定类型是否造成不同用户群体之间建议性能不一致性的出现。在本文中,我们分析了五个用户群体在偏好广受欢迎的内容方面各不相同的众所周知的建议性算法的这三个方面。此外,我们研究了不同类型如何影响建议性能的不一致性,以及这如何与版本的普及性相一致。我们使用来自LastFm、MovicLens和My AnimeList的数据,我们提出了两个关键结论。首先,我们发现对公众内容兴趣不大的用户获得最差的建议准确性,这与偏好和普及性提升相一致。第二,我们的实验表明,特定类型在不同程度上促成了建议性能的不一致性能,特别是在MyAnimeList数据集的校准方面。</s>