Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support finegrain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior platforms, we introduce general principles for and the architecture of an ML platform, Looper, with simple APIs for decision-making and feedback collection. Looper covers the end-to-end ML lifecycle from collecting training data and model training to deployment and inference, and extends support to personalization, causal evaluation with heterogenous treatment effects, and Bayesian tuning for product goals. During the 2021 production deployment Looper simultaneously hosted 440-1,000 ML models that made 4-6 million real-time decisions per second. We sum up experiences of platform adopters and describe their learning curve.
翻译:现代软件系统和产品日益依赖机器学习模式,在与用户、基础设施和其他系统互动的基础上作出以数据为驱动力的决定。为了更广泛地采用这种做法,这种做法必须:(一) 容纳没有ML背景的产品工程师,(二) 支持微粒产品计量评价和(三) 优化产品目标。为了解决先前平台的缺陷,我们引入了ML平台L平台Looper的一般原则和结构,该平台为决策和反馈收集提供了简单的API。Looper涵盖从收集培训数据和示范培训到部署和推断的终端至终端ML生命周期,并为个人化、具有异质治疗效果的因果评估和Bayesian产品目标调控提供支持。在2021年的生产部署期间,Looper同时托管了440-1 000 ML模型,每秒做出4-600万个实时决定。我们总结了平台采用者的经验并描述他们的学习曲线。