Mobile notification systems play a major role in a variety of applications to communicate, send alerts and reminders to the users to inform them about news, events or messages. In this paper, we formulate the near-real-time notification decision problem as a Markov Decision Process where we optimize for multiple objectives in the rewards. We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions. We address the challenge of offline learning using a Double Deep Q-network method based on Conservative Q-learning that mitigates the distributional shift problem and Q-value overestimation. We illustrate our fully-deployed system and demonstrate the performance and benefits of the proposed approach through both offline and online experiments.
翻译:移动通知系统在通信、向用户发送警报和提醒通知以告知其新闻、事件或信息的各种应用中发挥着主要作用。在本文件中,我们将近实时通知决定问题作为Markov决定程序,优化奖励的多重目标。我们提议了一个端对端的离线强化学习框架,优化顺序通知决定。我们应对使用基于保守性Q学习的双深Q网络方法进行离线学习的挑战,该方法可以缓解分布性转移问题和高估价值。我们通过离线和在线实验,展示了我们完全部署的系统,并展示了拟议方法的绩效和效益。