Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO with Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, such as the partial dependence plot (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. We propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.
翻译:自动超参数优化(HPO)可帮助从业人员在机器学习模型中取得顶峰性能,然而,往往缺乏对不同超参数对最后模型性能的影响的宝贵了解,这种缺乏解释使得难以信任和理解自动HPO过程及其结果。我们建议使用可解释的机器学习(IML)从巴耶斯优化(BO)在HPO期间获得的实验数据中获得洞察力。BO倾向于侧重于具有潜在高性能配置的有前景的区域,从而产生抽样偏差。因此,许多IML技术,如部分依赖性图(PDP),都具有产生偏差解释的风险。我们利用BO代金模型的后表不确定性,引入了具有估计信任带的PDP变量。我们提议对超参数空间进行分割,以便在相关次区域获得更自信和可靠的PDP。我们通过一项实验研究,为次区域内PDP质量的提高提供定量证据。