A shortcoming of black-box supervised learning models is their lack of interpretability or transparency. To facilitate interpretation, post-hoc global variable importance measures (VIMs) are widely used to assign to each predictor or input variable a numerical score that represents the extent to which that predictor impacts the fitted model's response predictions across the training data. It is well known that the most common existing VIMs, namely marginal Shapley and marginal permutation-based methods, can produce unreliable results if the predictors are highly correlated, because they require extrapolation of the response at predictor values that fall far outside the training data. Conditional versions of Shapley and permutation VIMs avoid or reduce the extrapolation but can substantially deflate the importance of correlated predictors. For the related goal of visualizing the effects of each predictor when strong predictor correlation is present, accumulated local effects (ALE) plots were recently introduced and have been widely adopted. This paper presents a new VIM approach based on ALE concepts that avoids both the extrapolation and the VIM deflation problems when predictors are correlated. We contrast, both theoretically and numerically, ALE VIMs with Shapley and permutation VIMs. Our results indicate that ALE VIMs produce similar variable importance rankings as Shapley and permutation VIMs when predictor correlations are mild and more reliable rankings when correlations are strong. An additional advantage is that ALE VIMs are far less computationally expensive.


翻译:黑盒监督学习模型的一个缺点是其缺乏可解释性或透明度。为促进解释,事后全局变量重要性度量被广泛用于为每个预测变量或输入变量分配一个数值分数,该分数表示该预测变量在整个训练数据上对拟合模型响应预测的影响程度。众所周知,若预测变量高度相关,最常见的现有VIMs(即边际Shapley方法和基于边际置换的方法)可能产生不可靠的结果,因为它们需要在远离训练数据的预测变量值处外推响应。条件化版本的Shapley和置换VIMs避免或减少了外推,但可能显著低估相关预测变量的重要性。对于存在强预测变量相关性时可视化每个预测变量效应的相关目标,累积局部效应图最近被提出并已得到广泛采用。本文提出了一种基于ALE概念的新VIM方法,该方法在预测变量相关时同时避免了外推和VIM低估问题。我们从理论和数值上对比了ALE VIMs与Shapley和置换VIMs。结果表明,当预测变量相关性较弱时,ALE VIMs产生与Shapley和置换VIMs相似的变量重要性排序;当相关性较强时,则产生更可靠的排序。另一个优势是ALE VIMs的计算成本远低于其他方法。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员