When developing and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total constitute over 700 multi-fidelity hyperparameter optimization problems, which all enable multi-objective hyperparameter optimization. Furthermore, we empirically compare surrogate-based benchmarks to the more widely-used tabular benchmarks, and demonstrate that the latter may produce unfaithful results regarding the performance ranking of HPO methods. We examine and compare our benchmark collection with respect to defined requirements and propose a single-objective as well as a multi-objective benchmark suite on which we compare 7 single-objective and 7 multi-objective optimizers in a benchmark experiment. Our software is available at [https://github.com/slds-lmu/yahpo_gym].
翻译:在开发和分析新的超光谱优化方法时,必须实证地评估并比较使用得力的基准套件。在这项工作中,我们提出了一套由适当属性和基准要求驱动的具有挑战性和相关性的新基准问题。我们新的代用基准收集由14种假设组成,总共构成700多个多纤维超光谱优化问题,所有这些都使得能够实现多目标超光谱优化。此外,我们从经验上将基于代用基准的基准与更为广泛使用的表格基准进行比较,并表明后者在高光谱方法的性能排名方面可能产生不真实的结果。我们检查并比较我们的基准收集与确定的要求有关的情况,并提议一个单一目标以及多目标基准套件,在基准实验中比较7个单一目标和7个多目标优化器。我们的软件可在[https://github.com/slds-lmu/yahpo_gym]查阅。