Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution - a machine learning pipeline - tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. While AutoML offers many prospects, it is also known to be quite resource-intensive, which is one of its major points of criticism. The primary cause for a high resource consumption is that many approaches rely on the (costly) evaluation of many machine learning pipelines while searching for good candidates. This problem is amplified in the context of research on AutoML methods, due to large scale experiments conducted with many datasets and approaches, each of them being run with several repetitions to rule out random effects. In the spirit of recent work on Green AI, this paper is written in an attempt to raise the awareness of AutoML researchers for the problem and to elaborate on possible remedies. To this end, we identify four categories of actions the community may take towards more sustainable research on AutoML, i.e. Green AutoML: design of AutoML systems, benchmarking, transparency and research incentives.
翻译:自动机学(Automal)力求自动配置机器学习算法及其组成,形成一个适合现有学习任务(数据集)的(软件)整体(软件)解决方案(机器学习管道),在过去十年中,自动ML发展成为一个独立研究领域,贡献数百项。虽然自动ML带来许多前景,但也被认为是资源密集型的,这是它的主要批评点之一。高资源消耗的主要原因是,许多方法依赖对许多机器学习管道的(成本)评价,同时寻找好候选人。在对自动ML方法的研究中,这个问题更加突出,因为对许多数据集和办法进行了大规模实验,每个都重复了几次,以排除随机效应。根据最近关于绿色AI的工作精神,本文件旨在提高自动ML研究人员对这一问题的认识,并详细说明可能采取的补救措施。为此,我们确定了社区可以采取的四类行动,即绿色自动ML:设计自动ML系统、基准、透明度和研究奖励措施。