Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This is especially problematic when bias is amplified over time as a few items (e.g., popular ones) are repeatedly over-represented in recommendation lists and users' interactions with those items will amplify bias towards those items over time resulting in a feedback loop. This issue has been extensively studied in the literature on model-based or neighborhood-based recommendation algorithms, but less work has been done on online recommendation models, such as those based on top-K contextual bandits, where recommendation models are dynamically updated with ongoing user feedback. In this paper, we study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits. We analyze these algorithms on their ability to handle exposure bias and provide a fair representation for items in the recommendation results. Our analysis reveals that these algorithms tend to amplify exposure disparity among items over time. In particular, we observe that these algorithms do not properly adapt to the feedback provided by the users and frequently recommend certain items even when those items are not selected by users. To mitigate this bias, we propose an Exposure-Aware (EA) reward model that updates the model parameters based on two factors: 1) user feedback (i.e., clicked or not), and 2) position of the item in the recommendation list. This way, the proposed model controls the utility assigned to items based on their exposure in the recommendation list. Extensive experiments on two real-world datasets using three contextual bandit algorithms show that the proposed reward model reduces exposure bias amplification in long run while maintaining the recommendation accuracy.
翻译:在推荐人系统中,项目和供应商在建议结果中没有得到同等代表,接触偏差是一个众所周知的问题,在推荐人系统中,项目和供应商在建议结果中没有得到同等代表,这种偏差在推荐人系统中是一个众所周知的问题。当偏差随着时间而扩大时,特别成问题,因为一些项目(如流行项目)在推荐人名单中代表过多,而用户与这些项目的互动将逐渐扩大对这些项目的偏差,从而导致反馈回路。关于基于模型或以邻居为基础的推荐算法的文献中已经广泛研究过这个问题,但在网上推荐模式中,例如基于上K背景偏差的模型中,建议模型的偏差得到动态更新。在本文中,我们研究一些众所周知的、称为Linear Cascating Bandits 的频段算法类别中的偏差。我们分析这些算法对于这些项目的偏差,最终导致反馈回移。我们用两种方法在用户没有选择的模型时,这些算出风险的偏差率。我们用两种方法来降低风险。我们提议在推荐人列表中,用两种推算的偏差。我们用两种推算的方法在定义的推算。