Automated Machine Learning (AutoML) systems have been shown to efficiently build good models for new datasets. However, it is often not clear how well they can adapt when the data evolves over time. The main goal of this study is to understand the effect of data stream challenges such as concept drift on the performance of AutoML methods, and which adaptation strategies can be employed to make them more robust. To that end, we propose 6 concept drift adaptation strategies and evaluate their effectiveness on different AutoML approaches. We do this for a variety of AutoML approaches for building machine learning pipelines, including those that leverage Bayesian optimization, genetic programming, and random search with automated stacking. These are evaluated empirically on real-world and synthetic data streams with different types of concept drift. Based on this analysis, we propose ways to develop more sophisticated and robust AutoML techniques.
翻译:自动机学习系统(自动机学习系统)被证明能高效率地为新的数据集建立良好的模型,然而,当数据随着时间推移而变化时,它们能够适应的程度往往不明确,这项研究的主要目的是了解数据流挑战的影响,例如概念漂移对自动机学方法的性能的影响,以及可以采用哪些适应战略来使其更加健全。为此,我们提出了6个概念漂移适应战略,并评价了不同自动机学方法的有效性。我们这样做是为了建立机器学习管道的多种自动机学方法,包括利用巴耶斯优化、基因编程和自动堆叠随机搜索的方法。这些方法在现实世界和合成数据流中以不同类型概念漂移的经验性评估。我们根据这一分析,提出如何开发更先进、更健全的自动ML技术。