多武装强盗在线和可缩放模式选择 (Online and Scalable Model Selection with Multi-Armed Bandits)

Many online applications running on live traffic are powered by machine learning models, for which training, validation, and hyper-parameter tuning are conducted on historical data. However, it is common for models demonstrating strong performance in offline analysis to yield poorer performance when deployed online. This problem is a consequence of the difficulty of training on historical data in non-stationary environments. Moreover, the machine learning metrics used for model selection may not sufficiently correlate with real-world business metrics used to determine the success of the applications being tested. These problems are particularly prominent in the Real-Time Bidding (RTB) domain, in which ML models power bidding strategies, and a change in models will likely affect performance of the advertising campaigns. In this work, we present Automatic Model Selector (AMS), a system for scalable online selection of RTB bidding strategies based on real-world performance metrics. AMS employs Multi-Armed Bandits (MAB) to near-simultaneously run and evaluate multiple models against live traffic, allocating the most traffic to the best-performing models while decreasing traffic to those with poorer online performance, thereby minimizing the impact of inferior models on overall campaign performance. The reliance on offline data is avoided, instead making model selections on a case-by-case basis according to actionable business goals. AMS allows new models to be safely introduced into live campaigns as soon as they are developed, minimizing the risk to overall performance. In live-traffic tests on multiple ad campaigns, the AMS system proved highly effective at improving ad campaign performance.

翻译：在现场交通中运行的许多在线应用程序都是由机器学习模型驱动的,在这些模型中,根据历史数据进行培训、验证和超参数调试。然而,在显示离线分析表现强的模型中,在在线部署时表现较差,这是常见的;这是在非静止环境中难以进行历史数据培训的结果。此外,用于模拟选择的机器学习衡量标准可能与用于确定测试应用程序成功与否的实际世界商业衡量标准不完全相关。这些问题在实时Bding(RTB)领域尤为突出,其中ML示范电力投标战略,而且模型的变化可能会影响广告运动的绩效。在这项工作中,我们介绍了自动模型选择(AMS),这是基于真实世界性业绩衡量标准对历史数据进行可扩展的在线选择的系统系统。AMS采用多Armed Bandits(MAB)来进行近乎模拟的运行,并评价与现场交通有关的多种模型。这些问题在实时Beding(RTB)领域尤为突出,将最出色的流量分配给最优秀的模型,同时减少与较贫穷的在线业绩运动之间的流量,从而最大限度地减少对总体业绩的在线业绩进行模拟测试。