Several studies have identified discrepancies between the popularity of items in user profiles and the corresponding recommendation lists. Such behavior, which concerns a variety of recommendation algorithms, is referred to as popularity bias. Existing work predominantly adopts simple statistical measures, such as the difference of mean or median popularity, to quantify popularity bias. Moreover, it does so irrespective of user characteristics other than the inclination to popular content. In this work, in contrast, we propose to investigate popularity differences (between the user profile and recommendation list) in terms of median, a variety of statistical moments, as well as similarity measures that consider the entire popularity distributions (Kullback-Leibler divergence and Kendall's tau rank-order correlation). This results in a more detailed picture of the characteristics of popularity bias. Furthermore, we investigate whether such algorithmic popularity bias affects users of different genders in the same way. We focus on music recommendation and conduct experiments on the recently released standardized LFM-2b dataset, containing listening profiles of Last.fm users. We investigate the algorithmic popularity bias of seven common recommendation algorithms (five collaborative filtering and two baselines). Our experiments show that (1) the studied metrics provide novel insights into popularity bias in comparison with only using average differences, (2) algorithms less inclined towards popularity bias amplification do not necessarily perform worse in terms of utility (NDCG), (3) the majority of the investigated recommenders intensify the popularity bias of the female users.
翻译:一些研究查明了用户概况中项目的受欢迎程度与相应建议列表之间的不同之处。这类涉及各种建议算法的行为被称为受欢迎程度偏差。现有工作主要采取简单的统计措施,如平均或中位的受欢迎程度差异,以量化受欢迎程度偏差。此外,我们调查这种受欢迎程度偏差是否以同样的方式影响到不同性别的用户。我们注重音乐建议,对最近发布的标准化LFM-2b数据集进行实验,其中载有Last.fm用户的监听概况。我们调查了7种通用建议算法的受欢迎程度偏差(5个协作过滤器和2个基线)。我们的实验显示,1种更详尽的受欢迎程度偏差是,而多数的受访者则以同样的方式(3个)更低的受欢迎程度比较。