Lambda学生:数据流快速递增学习 (Lambda Learner: Fast Incremental Learning on Data Streams)

One of the most well-established applications of machine learning is in deciding what content to show website visitors. When observation data comes from high-velocity, user-generated data streams, machine learning methods perform a balancing act between model complexity, training time, and computational costs. Furthermore, when model freshness is critical, the training of models becomes time-constrained. Parallelized batch offline training, although horizontally scalable, is often not time-considerate or cost-effective. In this paper, we propose Lambda Learner, a new framework for training models by incremental updates in response to mini-batches from data streams. We show that the resulting model of our framework closely estimates a periodically updated model trained on offline data and outperforms it when model updates are time-sensitive. We provide theoretical proof that the incremental learning updates improve the loss-function over a stale batch model. We present a large-scale deployment on the sponsored content platform for a large social network, serving hundreds of millions of users across different channels (e.g., desktop, mobile). We address challenges and complexities from both algorithms and infrastructure perspectives, and illustrate the system details for computation, storage, and streaming production of training data.

翻译：最成熟的机器学习应用之一是决定网站访问者的内容。当观测数据来自高速、用户生成的数据流时,机器学习方法在模型复杂性、培训时间和计算成本之间保持平衡。此外,当模型更新至关重要时,模型培训就会受到时间限制。平行分批脱线培训虽然水平可缩放,但往往没有时间考虑或成本效益。在本文中,我们提议一个新的培训模型框架Lambda Learner,即根据数据流的微型插管进行渐进更新的培训模型。我们显示,由此产生的框架模型密切估计定期更新的离线数据培训模型,在模型更新对时间敏感时,将超出这一模型。我们提供理论证据,证明递增学习更新改善了一个陈旧的批量模式的损失功能。我们为一个大型社会网络在受赞助的内容平台上大规模部署,为来自不同渠道(如台式、移动等)的数以亿计的用户提供服务。我们从算法和基础设施的角度处理挑战和复杂问题,并演示用于计算、储存、数据流和数据培训的系统细节。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/