咨询系统中的比亚和比比亚:调查和未来方向 (Bias and Debias in Recommender System: A Survey and Future Directions)

While recent years have witnessed a rapid growth of research papers on recommender system (RS), most of the papers focus on inventing machine learning models to better fit user behavior data. However, user behavior data is observational rather than experimental. This makes various biases widely exist in the data, including but not limited to selection bias, position bias, exposure bias, and popularity bias. Blindly fitting the data without considering the inherent biases will result in many serious issues, e.g., the discrepancy between offline evaluation and online metrics, hurting user satisfaction and trust on the recommendation service, etc. To transform the large volume of research models into practical improvements, it is highly urgent to explore the impacts of the biases and perform debiasing when necessary. When reviewing the papers that consider biases in RS, we find that, to our surprise, the studies are rather fragmented and lack a systematic organization. The terminology ``bias'' is widely used in the literature, but its definition is usually vague and even inconsistent across papers. This motivates us to provide a systematic survey of existing work on RS biases. In this paper, we first summarize seven types of biases in recommendation, along with their definitions and characteristics. We then provide a taxonomy to position and organize the existing work on recommendation debiasing. Finally, we identify some open challenges and envision some future directions, with the hope of inspiring more research work on this important yet less investigated topic. The summary of debiasing methods reviewed in this survey can be found at \url{https://github.com/jiawei-chen/RecDebiasing}.

翻译：虽然近年来关于推荐人系统的研究论文迅速增加,但大多数论文侧重于设计机器学习模型,以更好地适应用户行为数据。然而,用户行为数据是观察性的,而不是实验性的。这使得数据中存在各种偏见,包括但不限于选择偏差、职位偏差、暴露偏差和流行偏差。盲目地适应数据而不考虑内在偏差将导致许多严重问题,例如,离线评价和在线衡量标准之间的差异,损害用户满意度和对建议服务的信任等。为了将大量研究模型转化为实际改进,非常迫切需要探讨偏见的影响,并在必要时进行贬低。在审查考虑塞族共和国偏见的论文时,我们发现研究相当分散,缺乏系统化的组织。术语“偏差”在文献中被广泛使用,但其定义通常模糊,甚至各文件之间不一致。这促使我们系统地调查塞族共和国偏见问题。在本文中,我们首先总结了七类偏差,在考虑塞族共和国的偏差时,我们发现这些研究的偏差,最后我们组织了一些关于税收工作方向和愿景性研究,最后我们提出一些关于税收调查的定位,然后我们提出一些关于税收研究方向,然后提出一些关于税收研究方向和前景分析的论文。