One of the most well-established applications of machine learning is in deciding what content to show website visitors. When observation data comes from high-velocity, user-generated data streams, machine learning methods perform a balancing act between model complexity, training time, and computational costs. Furthermore, when model freshness is critical, the training of models becomes time-constrained. Parallelized batch offline training, although horizontally scalable, is often not time-considerate or cost-effective. In this paper, we propose Lambda Learner, a new framework for training models by incremental updates in response to mini-batches from data streams. We show that the resulting model of our framework closely estimates a periodically updated model trained on offline data and outperforms it when model updates are time-sensitive. We provide theoretical proof that the incremental learning updates improve the loss-function over a stale batch model. We present a large-scale deployment on the sponsored content platform for a large social network, serving hundreds of millions of users across different channels (e.g., desktop, mobile). We address challenges and complexities from both algorithms and infrastructure perspectives, and illustrate the system details for computation, storage, and streaming production of training data.
翻译:最成熟的机器学习应用之一是决定网站访问者的内容。当观测数据来自高速、用户生成的数据流时,机器学习方法在模型复杂性、培训时间和计算成本之间保持平衡。此外,当模型更新至关重要时,模型培训就会受到时间限制。平行分批脱线培训虽然水平可缩放,但往往没有时间考虑或成本效益。在本文中,我们提议一个新的培训模型框架Lambda Learner,即根据数据流的微型插管进行渐进更新的培训模型。我们显示,由此产生的框架模型密切估计定期更新的离线数据培训模型,在模型更新对时间敏感时,将超出这一模型。我们提供理论证据,证明递增学习更新改善了一个陈旧的批量模式的损失功能。我们为一个大型社会网络在受赞助的内容平台上大规模部署,为来自不同渠道(如台式、移动等)的数以亿计的用户提供服务。我们从算法和基础设施的角度处理挑战和复杂问题,并演示用于计算、储存、数据流和数据培训的系统细节。