We propose deviation-based learning, a new approach to training recommender systems. In the beginning, the recommender and rational users have different pieces of knowledge, and the recommender needs to learn the users' knowledge to make better recommendations. The recommender learns users' knowledge by observing whether each user followed or deviated from her recommendations. We show that learning frequently stalls if the recommender always recommends a choice: users tend to follow the recommendation blindly, and their choices do not reflect their knowledge. Social welfare and the learning rate are improved drastically if the recommender abstains from recommending a choice when she predicts that multiple arms will produce a similar payoff.
翻译:我们提出了基于偏差的学习,这是对培训建议者系统的一种新方法。在开始时,建议者与理性的使用者有不同的知识,建议者需要学习使用者的知识以提出更好的建议。建议者通过观察每个使用者是否遵循或偏离其建议来学习使用者的知识。我们表明,如果建议者总是建议一种选择,学习往往会拖延时间:使用者往往盲目地遵循建议,他们的选择并不反映他们的知识。如果推荐者在她预测多种武器会产生类似的回报时拒绝推荐选择,社会福利和学习率就会大幅提高。