In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than taking a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces.
翻译:在提供无数内容的时代,建议者系统通过向用户提供个性化内容建议来减轻信息超载,因为缺乏明确的用户反馈,现代建议者系统通常会优化所有用户的隐含反馈信号的固定组合,然而,这一方法忽视了越来越多的工作,强调(一) 用户可以多种方式使用隐含信号,从满意到积极厌恶的信号,以及(二) 不同用户以不同方式传达偏好。我们提议采用最近的互动基础学习模式(IGL) 来应对学习不同用户交流方式的表述的挑战。与其采用固定的、人为的奖励功能,IGL能够学习不同用户的个性化奖励功能,然后直接优化潜在用户的满意度。我们展示IGL通过模拟和真实世界生产轨迹进行实验的成功。