评估与解决推荐系统中负采样策略在用户群体间的公平性问题 (Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender Systems)

Recommender systems trained on implicit feedback data rely on negative sampling to distinguish positive items from negative items for each user. Since the majority of positive interactions come from a small group of active users, negative samplers are often impacted by data imbalance, leading them to choose more informative negatives for prominent users while providing less useful ones for users who are not so active. This leads to inactive users being further marginalised in the training process, thus receiving inferior recommendations. In this paper, we conduct a comprehensive empirical study demonstrating that state-of-the-art negative sampling strategies provide more accurate recommendations for active users than for inactive users. We also find that increasing the number of negative samples for each positive item improves the average performance, but the benefit is distributed unequally across user groups, with active users experiencing performance gain while inactive users suffering performance degradation. To address this, we propose a group-specific negative sampling strategy that assigns smaller negative ratios to inactive user groups and larger ratios to active groups. Experiments on eight negative samplers show that our approach improves user-side fairness and performance when compared to a uniform global ratio.

翻译：基于隐式反馈数据训练的推荐系统依赖于负采样来区分每位用户的正向与负向物品。由于大部分正向交互来自少数活跃用户，负采样器常受数据不平衡影响，导致其倾向于为突出用户选择信息量更大的负样本，而为非活跃用户提供效用较低的负样本。这使得非活跃用户在训练过程中进一步被边缘化，从而获得较差的推荐结果。本文通过全面的实证研究表明，当前最先进的负采样策略为活跃用户提供的推荐准确性高于非活跃用户。我们还发现，增加每个正向物品的负样本数量可提升平均性能，但其收益在用户群体间分布不均：活跃用户获得性能增益，而非活跃用户却遭受性能下降。为解决此问题，我们提出一种分组特异性负采样策略，为非活跃用户群体分配较小的负样本比例，为活跃群体分配较大比例。在八种负采样器上的实验表明，与统一的全局比例相比，我们的方法在提升用户侧公平性与性能方面具有显著效果。

相关内容

负采样

关注 76

自然语言处理领域中，判断两个单词是不是一对上下文词（context）与目标词（target），如果是一对，则是正样本，如果不是一对，则是负样本。采样得到一个上下文词和一个目标词，生成一个正样本（positive example），生成一个负样本（negative example），则是用与正样本相同的上下文词，再在字典中随机选择一个单词，这就是负采样（negative sampling）。

【WWW2024】在MOOCs中利用对比学习建模平衡显式与隐式关系以推荐知识概念

专知会员服务

14+阅读 · 2024年2月14日

【AAAI2024】面向序列推荐的插件扩散模型

专知会员服务

27+阅读 · 2024年1月9日

【ICCV2023】保留模态结构改进多模态学习

专知会员服务

31+阅读 · 2023年8月28日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日