In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.
翻译:在真实数据中,缺失值频繁出现,这影响了可解释机器学习(IML)方法的解释效果。近期研究关注偏差问题,并表明不同插补方法可能导致模型解释存在差异,但忽略了额外的插补不确定性及其对方差和置信区间的影响。因此,我们比较了不同插补方法对IML方法(包括置换特征重要性、部分依赖图和Shapley值)的置信区间覆盖概率的影响。研究表明,单一插补会导致方差被低估,且在大多数情况下,只有多重插补能接近名义覆盖水平。