When developing and analyzing new hyperparameter optimization (HPO) methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we list desirable properties and requirements for such benchmarks and propose a new set of challenging and relevant multifidelity HPO benchmark problems motivated by these requirements. For this, we revisit the concept of surrogate-based benchmarks and empirically compare them to more widely-used tabular benchmarks, showing that the latter ones may induce bias in performance estimation and ranking of HPO methods. We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total. All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO. We examine and compare our benchmark suite with respect to the defined requirements and show that our benchmarks provide viable additions to existing suites.
翻译:在制订和分析新的超参数优化方法时,必须实证地评估并比较这些方法的精确基准套件。在这项工作中,我们列出这些基准的理想属性和要求,并提出一套由这些要求引发的具有挑战性和相关性的多性人本组织基准问题。为此,我们重新研究代用基准概念,并以经验方式将它们与更为广泛使用的表格基准进行比较,表明后一种基准可能在业绩估计和高性本方法的排名方面产生偏差。我们为多重性本方案方法提出了一个新的基于代用基准套件。我们为共构成700多个多性能HPO问题的9个基准采集方法提出了新的代用基准套件。我们的所有基准都允许对多重优化目标进行查询,使多目的HPO能够设定基准。我们检查并比较了我们的基准套件与确定的要求之间的基准,并表明我们的基准为现有套件提供了可行的补充。