Nowadays, machine learning is playing a crucial role in harnessing the power of the massive amounts of data that we are currently producing every day in our digital world. With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and application needs in our digital world. In response to this demand, several automated machine learning (AutoML) techniques and frameworks have been developed to fill the gap of human expertise by automating the process of building machine learning pipelines. In this study, we present a comprehensive evaluation and comparison of the performance characteristics of six popular AutoML frameworks, namely, Auto-Weka, AutoSKlearn, TPOT, Recipe, ATM, and SmartML across 100 data sets from established AutoML benchmark suites. Our experimental evaluation considers different aspects for its comparison including the performance impact of several design decisions including time budget, size of search space, meta-learning, and ensemble construction. The results of our study reveal various interesting insights that can significantly guide and impact the design of AutoML frameworks.
翻译:目前,机器学习在利用我们目前每天在数字世界中产生的大量数据的力量方面发挥着关键作用。随着对机器学习应用程序需求的激增,人们认识到,知识丰富的数据科学家人数无法随着数字世界中数据数量和应用需求的增加而扩大。为了应对这一需求,开发了几种自动机学习(自动学习)技术和框架,以通过自动建立机器学习管道的过程来填补人类专门知识的差距。在本研究中,我们对六个流行的Auto-Weka、AutoSKlearn、TPOT、Recipe、ATM和SmartML等AutoML框架的性能特点进行了全面评估和比较。我们的实验性评估考虑了其比较的不同方面,包括若干设计决定的性能影响,包括时间预算、搜索空间的大小、元学习和共同构建。我们研究的结果揭示了各种有趣的见解,这些见解可以极大地指导并影响AutoML框架的设计。