In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than requiring a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces.
翻译:在提供无数内容的时代,建议者系统通过向用户提供个性化内容建议来减轻信息超载,因为缺乏明确的用户反馈,现代建议者系统通常会优化所有用户使用同样固定的、隐含的反馈信号,然而,这一方法忽视了越来越多的工作,强调(一) 用户可以多种方式使用隐含的信号,从满意到积极厌恶等任何信号,以及(二) 不同的用户以不同方式传达偏好。我们提议采用最近的互动基础学习模式(IGL) 来应对学习不同用户交流方式的表述的挑战。与其要求固定的、人为的奖励功能,IGL能够学习不同用户个性化的奖励功能,然后直接优化潜在用户的满意度。我们展示IGL通过模拟和真实世界生产痕迹进行实验的成功。</s>