Recently, short video platforms have achieved rapid user growth by recommending interesting content to users. The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users). Retention is a long-term feedback after multiple interactions of users and the system, and it is hard to decompose retention reward to each item or a list of items. Thus traditional point-wise and list-wise models are not able to optimize retention. In this paper, we choose reinforcement learning methods to optimize the retention as they are designed to maximize the long-term performance. We formulate the problem as an infinite-horizon request-based Markov Decision Process, and our objective is to minimize the accumulated time interval of multiple sessions, which is equal to improving the app open frequency and user retention. However, current reinforcement learning algorithms can not be directly applied in this setting due to uncertainty, bias, and long delay time incurred by the properties of user retention. We propose a novel method, dubbed RLUR, to address the aforementioned challenges. Both offline and live experiments show that RLUR can significantly improve user retention. RLUR has been fully launched in Kuaishou app for a long time, and achieves consistent performance improvement on user retention and DAU.
翻译:最近,短视频平台通过向用户推荐有趣的内容,实现了用户快速增长。建议的目的是优化用户保留,从而推动DAU(Dai积极用户)的增长。保留是用户和系统多重互动后的长期反馈,很难将留用奖励分解到每个项目或项目清单中。因此,传统的点数和列表型模式无法优化保留。在本文件中,我们选择了强化学习方法,以优化保留,因为这些方法的设计是为了最大限度地提高长期性能。我们把问题设计成一个基于无限和请求的马尔科夫决策程序,我们的目标是最大限度地减少多个会议累积的时间间隔,这等于改进软件开放频率和用户保留。但是,由于不确定性、偏差和用户保留特性造成的长期拖延,目前的强化学习算法不能直接用于这一环境。我们提出了一种新颖的方法,即调制RLUR,以应对上述挑战。我们离线和现场实验都表明,RLUR能够大大改进用户的留存,而我们的目标是尽量减少多次会议的累积时间,这等于改进了软件的开放频率和用户保留。在Kuais的留存中,用户已经完全实现了长期的改进。