Personalized recommender systems suffuse modern life, shaping what media we read and what products we consume. Algorithms powering such systems tend to consist of supervised learning-based heuristics, such as latent factor models with a variety of heuristically chosen prediction targets. Meanwhile, theoretical treatments of recommendation frequently address the decision-theoretic nature of the problem, including the need to balance exploration and exploitation, via the multi-armed bandits (MABs) framework. However, MAB-based approaches rely heavily on assumptions about human preferences. These preference assumptions are seldom tested using human subject studies, partly due to the lack of publicly available toolkits to conduct such studies. In this work, we conduct a study with crowdworkers in a comics recommendation MABs setting. Each arm represents a comic category, and users provide feedback after each recommendation. We check the validity of core MABs assumptions-that human preferences (reward distributions) are fixed over time-and find that they do not hold. This finding suggests that any MAB algorithm used for recommender systems should account for human preference dynamics. While answering these questions, we provide a flexible experimental framework for understanding human preference dynamics and testing MABs algorithms with human users. The code for our experimental framework and the collected data can be found at https://github.com/HumainLab/human-bandit-evaluation.
翻译:个性化推荐系统已经渗透到现代生活中,影响着我们读什么媒体、消费什么产品。推荐算法往往由基于监督学习的启发式算法构成,例如具有各种启发式选择预测目标的潜在因子模型。同时,关于推荐的理论研究通常也涉及问题的决策理论性质,包括通过多臂赌博机(MABs)框架平衡勘探和开发的需要。然而,基于MAB的方法在很大程度上依赖于人类偏好的假设。这些偏好假设很少通过人类主体研究进行验证,部分原因是缺乏进行此类研究的公共工具包。在这项工作中,我们对漫画推荐MAB的场景下进行了与众包工人的研究。每个摇臂代表一种漫画类别,用户在每次推荐后进行反馈。我们检查了MAB的核心假设——人类偏好(奖励分布)随时间固定——并发现它们并不成立。该发现表明,用于推荐系统的任何MAB算法都应考虑人类偏好动态。在回答这些问题的同时,我们提供了一个灵活的实验框架,用于理解人类偏好动态并测试带有人类用户评估的MAB算法。我们的实验框架的代码和收集数据可在 https://github.com/HumainLab/human-bandit-evaluation。