Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from the validation to the test set. However, some of the models might no longer be Pareto-optimal which makes it unclear how to quantify the performance of the MHPO method when evaluated on the test set. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and studying its capabilities for comparing two optimization experiments.
翻译:现代机器学习模型的构建往往考虑到多种目标,例如尽量减少推论时间,同时尽量提高准确性。多目标超参数优化算法返回这些候选模型,并使用Pareto前方的近似值来评估其性能。在实践中,我们还想测量从验证到测试集时的概括性。然而,有些模型可能不再是Pareto最佳模型,因此在测试集评估时,不清楚如何量化多参数优化方法的性能。为了解决这个问题,我们提供了一个新的评估协议,以便测量多参数优化方法的一般性能,并研究其比较两种优化实验的能力。