模拟用户对衡量建议系统的影响 (Measuring Recommender System Effects with Simulated Users)

from arxiv, Presented at Second Workshop on Fairness, Accountability, Transparency, Ethics and Society on the Web (FATES 2020) with the title "Beyond Next Step Bias: Trajectory Simulation for Understanding Recommender System Behavior"

Imagine a food recommender system -- how would we check if it is \emph{causing} and fostering unhealthy eating habits or merely reflecting users' interests? How much of a user's experience over time with a recommender is caused by the recommender system's choices and biases, and how much is based on the user's preferences and biases? Popularity bias and filter bubbles are two of the most well-studied recommender system biases, but most of the prior research has focused on understanding the system behavior in a single recommendation step. How do these biases interplay with user behavior, and what types of user experiences are created from repeated interactions? In this work, we offer a simulation framework for measuring the impact of a recommender system under different types of user behavior. Using this simulation framework, we can (a) isolate the effect of the recommender system from the user preferences, and (b) examine how the system performs not just on average for an "average user" but also the extreme experiences under atypical user behavior. As part of the simulation framework, we propose a set of evaluation metrics over the simulations to understand the recommender system's behavior. Finally, we present two empirical case studies -- one on traditional collaborative filtering in MovieLens and one on a large-scale production recommender system -- to understand how popularity bias manifests over time.

翻译：想象一个食品建议系统 — — 我们如何检查它是否是 \ emph{ causing} 并培养不健康的饮食习惯, 或只是反映用户的兴趣? 用户在一段时间里与推荐者打交道的经验有多少是由推荐者系统的选择和偏向造成的, 以及在多大程度上基于用户的偏好和偏向? 普及偏向和过滤泡沫是最受广泛研究的推荐者系统偏向, 但大多数先前的研究都集中在一个建议步骤中了解系统的行为。这些偏向与用户行为的互动,以及反复互动产生哪些用户经验? 在这项工作中,我们提供了一个模拟框架,用于衡量推荐者系统在不同类型用户行为下的影响。使用这个模拟框架,我们可以(a) 将推荐者系统的影响与用户的偏好区分开来, (b) 大众偏见和过滤泡沫泡沫是两个最深的推荐者系统的偏向性, 但也包括典型用户行为下的极端经验。作为模拟框架的一部分, 我们提出一套评估模型, 来了解推荐者在一次时间的过滤中如何理解一个系统。