Hyperparameters tuning is a fundamental, yet computationally expensive, step in optimizing machine learning models. Beyond optimization, understanding the relative importance and interaction of hyperparameters is critical to efficient model development. In this paper, we introduce MetaSHAP, a scalable semi-automated eXplainable AI (XAI) method, that uses meta-learning and Shapley values analysis to provide actionable and dataset-aware tuning insights. MetaSHAP operates over a vast benchmark of over 09 millions evaluated machine learning pipelines, allowing it to produce interpretable importance scores and actionable tuning insights that reveal how much each hyperparameter matters, how it interacts with others and in which value ranges its influence is concentrated. For a given algorithm and dataset, MetaSHAP learns a surrogate performance model from historical configurations, computes hyperparameters interactions using SHAP-based analysis, and derives interpretable tuning ranges from the most influential hyperparameters. This allows practitioners not only to prioritize which hyperparameters to tune, but also to understand their directionality and interactions. We empirically validate MetaSHAP on a diverse benchmark of 164 classification datasets and 14 classifiers, demonstrating that it produces reliable importance rankings and competitive performance when used to guide Bayesian optimization.
翻译:超参数调优是优化机器学习模型的一个基础但计算成本高昂的步骤。除了优化本身,理解超参数的相对重要性和交互作用对于高效的模型开发至关重要。本文提出 MetaSHAP,一种可扩展的半自动化可解释人工智能方法,它利用元学习和 Shapley 值分析,提供可操作且与数据集相关的调优洞见。MetaSHAP 基于一个包含超过 900 万条已评估机器学习流水线的庞大基准运行,使其能够生成可解释的重要性分数和可操作的调优建议,揭示每个超参数的重要性程度、它与其他超参数的交互作用,以及其影响主要集中在哪些数值区间。对于给定的算法和数据集,MetaSHAP 从历史配置中学习一个代理性能模型,使用基于 SHAP 的分析计算超参数交互作用,并从最具影响力的超参数中推导出可解释的调优范围。这使得实践者不仅能确定需要优先调优的超参数,还能理解其作用方向和交互关系。我们在包含 164 个分类数据集和 14 个分类器的多样化基准上对 MetaSHAP 进行了实证验证,结果表明,当用于指导贝叶斯优化时,它能产生可靠的重要性排序和具有竞争力的性能。