Automated Machine Learning (AutoML) has been used successfully in settings where the learning task is assumed to be static. In many real-world scenarios, however, the data distribution will evolve over time, and it is yet to be shown whether AutoML techniques can effectively design online pipelines in dynamic environments. This study aims to automate pipeline design for online learning while continuously adapting to data drift. For this purpose, we design an adaptive Online Automated Machine Learning (OAML) system, searching the complete pipeline configuration space of online learners, including preprocessing algorithms and ensembling techniques. This system combines the inherent adaptation capabilities of online learners with the fast automated pipeline (re)optimization capabilities of AutoML. Focusing on optimization techniques that can adapt to evolving objectives, we evaluate asynchronous genetic programming and asynchronous successive halving to optimize these pipelines continually. We experiment on real and artificial data streams with varying types of concept drift to test the performance and adaptation capabilities of the proposed system. The results confirm the utility of OAML over popular online learning algorithms and underscore the benefits of continuous pipeline redesign in the presence of data drift.
翻译:自动机器学习(Automal)在假定学习任务为静态的环境下得到成功使用,但在许多现实世界情景中,数据分配将随着时间的变化而变化,还有待于证明自动ML技术能否在动态环境中有效设计在线管道;这项研究旨在自动设计在线学习的管道设计,同时不断适应数据漂流;为此目的,我们设计了一个适应性在线自动机学习(OAML)系统,搜索在线学习者完整的管道配置空间,包括预处理算法和组合技术;这个系统将在线学习者的固有适应能力与自动ML的快速自动管道(再)优化能力结合起来;侧重于能够适应不断演变的目标的优化技术,我们评估无同步的基因方案,并连续不同步地减半,以不断优化这些管道;我们用不同类型的概念流来试验拟议系统的实际和人工数据流,以测试其性能和适应能力;结果证实OAML对通用在线学习算法的效用,并强调在数据流流存在时不断重新设计管道的好处。