Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution - a machine learning pipeline - tailored to the learning task (dataset) at hand. Over the last decade, AutoML has become a hot research topic with hundreds of contributions. While AutoML offers many prospects, it is also known to be quite resource-intensive, which is one of its major points of criticism. The primary cause for a high resource consumption is that many approaches rely on the (costly) evaluation of many ML pipelines while searching for good candidates. This problem is amplified in the context of research on AutoML methods, due to large scale experiments conducted with many datasets and approaches, each of them being run with several repetitions to rule out random effects. In the spirit of recent work on Green AI, this paper is written in an attempt to raise the awareness of AutoML researchers for the problem and to elaborate on possible remedies. To this end, we identify four categories of actions the community may take towards more sustainable research on AutoML, namely approach design, benchmarking, research incentives, and transparency.
翻译:自动机学(Automal)力求自动配置机器学习算法及其组成,形成一个适合现有学习任务(数据集)的(软件)整体(软件)解决办法(机器学习管道),过去十年来,自动ML已成为一个热题研究专题,有数百项贡献。虽然自动ML提供了许多前景,但也被认为是相当资源密集型的,这是它的主要批评点之一。高资源消耗的主要原因是,许多方法依赖对许多ML管道的(成本)评价,同时寻找好候选人。这个问题在对自动ML方法的研究中有所扩大,因为对许多数据集和办法进行了大规模实验,每次实验都多次重复,以排除随机效应。根据最近关于绿色AI的工作精神,本文件旨在提高自动ML研究人员对这一问题的认识,并详细说明可能的补救办法。为此,我们确定了社区可以采取的四类行动,即方法设计、基准、研究激励和透明度,以便更可持续地研究自动MLML系统。