Modern machine learning models are often constructed taking into account multiple objectives, e.g., to minimize inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models and the approximation of the Pareto front is used to assess their performance. However, when estimating generalization performance of an approximation of a Pareto front found on a validation set by computing the performance of the individual models on the test set, models might no longer be Pareto-optimal. This makes it unclear how to measure performance. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and to study its capabilities for comparing two optimization experiments.
翻译:现代机器学习模型的构建往往考虑到多种目标,例如尽量减少推论时间,同时尽量提高准确性。多目标超参数优化算法返回这些候选模型,并使用Pareto前方的近似值来评估其性能。然而,在通过计算测试集中单个模型的性能来估计一套验证中发现的Pareto前方的近似性能时,模型可能不再为Pareto最佳模型。这使人不清楚如何衡量性能。为了解决这个问题,我们提供了一个新的评价协议,以便测量MHPO方法的通用性能,并研究其比较两种优化实验的能力。