Hyperparameters in machine learning (ML) have received a fair amount of attention, and hyperparameter tuning has come to be regarded as an important step in the ML pipeline. But just how useful is said tuning? While smaller-scale experiments have been previously conducted, herein we carry out a large-scale investigation, specifically, one involving 26 ML algorithms, 250 datasets (regression and both binary and multinomial classification), 6 score metrics, and 28,857,600 algorithm runs. Analyzing the results we conclude that for many ML algorithms we should not expect considerable gains from hyperparameter tuning on average, however, there may be some datasets for which default hyperparameters perform poorly, this latter being truer for some algorithms than others. By defining a single hp_score value, which combines an algorithm's accumulated statistics, we are able to rank the 26 ML algorithms from those expected to gain the most from hyperparameter tuning to those expected to gain the least. We believe such a study may serve ML practitioners at large.
翻译:机器学习(ML)中的超参数得到了相当程度的注意,超参数调试被认为是ML管道中的一个重要步骤。 但是,说调效果如何呢? 虽然以前曾进行过规模较小的实验,但我们在这里进行了大规模的调查,具体地说,涉及26 ML算法、250个数据集(倒退以及二进制和多元分类)、6个评分和28,857,600个算法的运行。分析结果,我们的结论是,对于许多超参数调的算法,我们不应期望平均从超参数调中获得相当大的收益。但是,可能有些数据集的默认超参数运行不良,而后者对某些算法来说比另一些算法更为真实。通过界定单一的 hp_ 核心值,将算法积累的统计综合起来,我们可以将26 ML算法从预期的超参数调制成到预期获得最少的算法。我们认为,这样的研究可能为大ML开业者服务。