Recommender systems trained on implicit feedback data rely on negative sampling to distinguish positive items from negative items for each user. Since the majority of positive interactions come from a small group of active users, negative samplers are often impacted by data imbalance, leading them to choose more informative negatives for prominent users while providing less useful ones for users who are not so active. This leads to inactive users being further marginalised in the training process, thus receiving inferior recommendations. In this paper, we conduct a comprehensive empirical study demonstrating that state-of-the-art negative sampling strategies provide more accurate recommendations for active users than for inactive users. We also find that increasing the number of negative samples for each positive item improves the average performance, but the benefit is distributed unequally across user groups, with active users experiencing performance gain while inactive users suffering performance degradation. To address this, we propose a group-specific negative sampling strategy that assigns smaller negative ratios to inactive user groups and larger ratios to active groups. Experiments on eight negative samplers show that our approach improves user-side fairness and performance when compared to a uniform global ratio.
翻译:基于隐式反馈数据训练的推荐系统依赖于负采样来区分每位用户的正向与负向物品。由于大部分正向交互来自少数活跃用户,负采样器常受数据不平衡影响,导致其倾向于为突出用户选择信息量更大的负样本,而为非活跃用户提供效用较低的负样本。这使得非活跃用户在训练过程中进一步被边缘化,从而获得较差的推荐结果。本文通过全面的实证研究表明,当前最先进的负采样策略为活跃用户提供的推荐准确性高于非活跃用户。我们还发现,增加每个正向物品的负样本数量可提升平均性能,但其收益在用户群体间分布不均:活跃用户获得性能增益,而非活跃用户却遭受性能下降。为解决此问题,我们提出一种分组特异性负采样策略,为非活跃用户群体分配较小的负样本比例,为活跃群体分配较大比例。在八种负采样器上的实验表明,与统一的全局比例相比,我们的方法在提升用户侧公平性与性能方面具有显著效果。