Machine learning algorithms such as random forests or xgboost are gaining more importance and are increasingly incorporated into production processes in order to enable comprehensive digitization and, if possible, automation of processes. Hyperparameters of these algorithms used have to be set appropriately, which can be referred to as hyperparameter tuning or optimization. Based on the concept of tunability, this article presents an overview of theoretical and practical results for popular machine learning algorithms. This overview is accompanied by an experimental analysis of 30 hyperparameters from six relevant machine learning algorithms. In particular, it provides (i) a survey of important hyperparameters, (ii) two parameter tuning studies, and (iii) one extensive global parameter tuning study, as well as (iv) a new way, based on consensus ranking, to analyze results from multiple algorithms. The R package mlr is used as a uniform interface to the machine learning models. The R package SPOT is used to perform the actual tuning (optimization). All additional code is provided together with this paper.
翻译:随机森林或Xgboust等机器学习算法越来越重要,并越来越多地被纳入生产过程,以便能够进行全面数字化,并在可能情况下实现流程自动化。这些算法使用的超参数必须适当设置,可称为超参数调制或优化。根据金枪鱼可操作性概念,本篇文章概述了流行机器学习算法的理论和实际结果。本概览附有对六个相关机器学习算法的30个超参数的实验分析。特别是,它提供了(一) 重要超参数调查,(二) 两个参数调制研究,和(三) 一个广泛的全球参数调制研究,以及(四) 基于共识的排序分析多重算法结果的新方法。R 软件包 mlr 用作机器学习模型的统一界面。R软件包SPOT用来进行实际调试(优化)。所有附加代码都与本文一起提供。