Per-instance automated algorithm configuration and selection are gaining significant moments in evolutionary computation in recent years. Two crucial, sometimes implicit, ingredients for these automated machine learning (AutoML) methods are 1) feature-based representations of the problem instances and 2) performance prediction methods that take the features as input to estimate how well a specific algorithm instance will perform on a given problem instance. Non-surprisingly, common machine learning models fail to make predictions for instances whose feature-based representation is underrepresented or not covered in the training data, resulting in poor generalization ability of the models for problems not seen during training.In this work, we study leave-one-problem-out (LOPO) performance prediction. We analyze whether standard random forest (RF) model predictions can be improved by calibrating them with a weighted average of performance values obtained by the algorithm on problem instances that are sufficiently close to the problem for which a performance prediction is sought, measured by cosine similarity in feature space. While our RF+clust approach obtains more accurate performance prediction for several problems, its predictive power crucially depends on the chosen similarity threshold as well as on the feature portfolio for which the cosine similarity is measured, thereby opening a new angle for feature selection in a zero-shot learning setting, as LOPO is termed in machine learning.
翻译:近些年来,自动化自动算法配置和选择在进化计算中占据重要位置。这些自动机器学习(Automal)方法的两个关键(有时隐含的)要素是:(1) 问题实例的基于特征的描述和(2) 将特征作为投入的性能预测方法,用以估计特定问题实例特定算法实例的绩效。非令人惊讶的是,通用的机器学习模型未能预测基于特征的表述在培训数据中代表性不足或未包含的情况,导致模型对培训期间未见的问题的概括性能力差。在这项工作中,我们研究的“一号问题”(LOPO)绩效预测。我们分析标准随机森林模型的预测能否通过对标准森林模型的随机性能值的加权平均值加以校准来改进,因为根据对问题实例的算法获得的性能预测与寻求绩效预测的问题非常接近,以特征空间的相似性能衡量。虽然我们的RF+clutt 方法为若干问题获得更准确的绩效预测,但其预测能力关键地取决于所选择的类似性阈值阈值的阈值阈值阈值阈值,从而设定了最低业务组合的深度学习。