As big data grows ubiquitous across many domains, more and more stakeholders seek to develop Machine Learning (ML) applications on their data. The success of an ML application usually depends on the close collaboration of ML experts and domain experts. However, the shortage of ML engineers remains a fundamental problem. Low-code Machine learning tools/platforms (aka, AutoML) aim to democratize ML development to domain experts by automating many repetitive tasks in the ML pipeline. This research presents an empirical study of around 14k posts (questions + accepted answers) from Stack Overflow (SO) that contained AutoML-related discussions. We examine how these topics are spread across the various Machine Learning Life Cycle (MLLC) phases and their popularity and difficulty. This study offers several interesting findings. First, we find 13 AutoML topics that we group into four categories. The MLOps topic category (43% questions) is the largest, followed by Model (28% questions), Data (27% questions), Documentation (2% questions). Second, Most questions are asked during Model training (29%) (i.e., implementation phase) and Data preparation (25%) MLLC phase. Third, AutoML practitioners find the MLOps topic category most challenging, especially topics related to model deployment & monitoring and Automated ML pipeline. These findings have implications for all three AutoML stakeholders: AutoML researchers, AutoML service vendors, and AutoML developers. Academia and Industry collaboration can improve different aspects of AutoML, such as better DevOps/deployment support and tutorial-based documentation.
翻译:随着大数据在许多领域不断增长,越来越多的利益攸关方寻求开发数据中的机器学习(ML)应用程序。ML应用程序的成功通常取决于ML专家和域专家的密切合作。然而,ML工程师的短缺仍然是一个根本性问题。低码机器学习工具/平台(aka、Automal)的目的是通过将ML管道中许多重复性任务自动化,使ML开发向域专家民主化。这项研究对来自Stack Overflow(SO)的大约14k个职位(问题+已接受的答案)进行了经验性研究,其中包括与自动MLM有关的讨论。我们研究了这些议题如何在机器学习生命周期的各个阶段及其受欢迎程度和困难之间传播。首先,我们发现了13个自动学习工具/平台(aka、AutomalML)专题(43%问题)是最大的,其次是模型(28 % 问题)、数据(27 % 问题)、文件基础(2% 问题)。第二,在模式培训(29 % ) (i.,执行阶段) 和最具有挑战性的MLLMLLA(25) 阶段,这些与MLML(ML) 相关的议题。