Scheduled batch jobs have been widely used on the asynchronous computing platforms to execute various enterprise applications, including the scheduled notifications and the candidate pre-computation for the modern recommender systems. It is important to deliver or update the information to the users at the right time to maintain the user experience and the execution impact. However, it is challenging to provide a versatile execution time optimization solution for the user-basis scheduled jobs to satisfy various product scenarios while maintaining reasonable infrastructure resource consumption. In this paper, we describe how we apply a learning-to-rank approach plus a "best time policy" in the best time selection. In addition, we propose an ensemble learner to minimize the ranking loss by efficiently leveraging multiple streams of user activity signals in our scheduling decisions of the execution time. Especially, we observe the cannibalization cross use cases to compete the user's peak time slot and introduce a coordination system to mitigate the problem. Our optimization approach has been successfully tested with production traffic that serves billions of users per day, with statistically significant improvements in various product metrics, including the notifications and content candidate generation. To the best of our knowledge, our study represents the first ML-based multi-tenant solution of the execution time optimization problem for the scheduled jobs at a large industrial scale cross different product domains.
翻译:在非同步计算平台上广泛使用分批工作,以实施各种企业应用程序,包括预定通知和现代推荐人系统的候选预考;重要的是在适当的时候向用户提供或更新信息,以保持用户经验和执行影响;然而,为用户基准排定的工作提供多种执行时间优化解决方案,以满足各种产品情景,同时保持合理的基础设施资源消耗;在本文件中,我们描述了我们如何在最佳时间选择中采用学习对齐办法,加上“最佳时间政策”;此外,我们提议一个联合学习者,通过在我们执行时间的时间安排决定中有效利用多种用户活动信号,尽量减少排名损失;特别是,我们观察食用交叉案件,以竞争用户的高峰时间档,并引入协调系统来缓解问题。我们的最佳优化方法已经成功地测试了每天为数十亿用户服务的生产流量,在统计上显著改进了各种产品衡量标准,包括通知和内容候选人生成。我们最先进的知识是,我们的研究利用多种用户活动流的多流信号来尽量减少排名损失。我们所观察到的是,我们观察食用交叉案件来竞争用户的高峰时间档时间档,并引入一个协调系统,以缓解问题。我们的优化方法已经成功地测试了以每天为数十亿用户服务的各种产品的跨比例。