We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationships with a recommender system. Measurement, attribution, and coordination challenges complicate algorithm design. We describe careful modeling -- including a new representation of user state and key conditional independence assumptions -- which overcomes these challenges and leads to simple, testable recommender system prototypes. We apply our approach to a podcast recommender system that makes personalized recommendations to hundreds of millions of listeners. A/B tests demonstrate that purposefully optimizing for long-term outcomes leads to large performance gains over conventional approaches that optimize for short-term proxies.
翻译:我们研究如何优化一个针对数周或数月来产生的结果的建议系统的问题。我们首先利用强化学习来制定一个用户与推荐者系统经常性关系的全面模式。衡量、归因和协调挑战使算法设计复杂化。我们描述了仔细的建模 -- -- 包括用户状态的新表述和关键的有条件独立假设 -- -- 以克服这些挑战并导致简单、可测试的建议系统原型。我们运用了一种播客建议系统,向数亿听众提供个性化建议。A/B测试表明,对长期结果的特意优化将带来与传统方法相比的巨大绩效收益,而传统方法则优化短期代言者。</s>