So far, most research on recommender systems focused on maintaining long-term user engagement and satisfaction, by promoting relevant and personalized content. However, it is still very challenging to evaluate the quality and the reliability of this content. In this paper, we propose FEBR (Expert-Based Recommendation Framework), an apprenticeship learning framework to assess the quality of the recommended content on online platforms. The framework exploits the demonstrated trajectories of an expert (assumed to be reliable) in a recommendation evaluation environment, to recover an unknown utility function. This function is used to learn an optimal policy describing the expert's behavior, which is then used in the framework to provide high-quality and personalized recommendations. We evaluate the performance of our solution through a user interest simulation environment (using RecSim). We simulate interactions under the aforementioned expert policy for videos recommendation, and compare its efficiency with standard recommendation methods. The results show that our approach provides a significant gain in terms of content quality, evaluated by experts and watched by users, while maintaining almost the same watch time as the baseline approaches.
翻译:迄今为止,大多数关于推荐者系统的研究都侧重于通过促进相关和个性化内容来保持长期用户的参与和满意度,然而,评价该内容的质量和可靠性仍是一项非常艰巨的任务。我们在本文件中提议建立一个学徒学习框架,即专家建议框架,以评估在线平台上建议内容的质量。框架利用建议评价环境中专家(被认为可靠)的明显轨迹,恢复一个未知的效用功能。这一功能被用来学习一种描述专家行为的最佳政策,然后在框架中用于提供高质量和个性化的建议。我们通过用户兴趣模拟环境(使用RecSim)评估我们解决方案的绩效。我们根据上述专家政策模拟视频建议的互动,并将其效率与标准建议方法进行比较。结果显示,我们的方法在内容质量、专家评价和用户观察方面有很大的收益,同时保持与基线方法几乎相同的观察时间。