Machine learning (ML) is increasingly often used to inform high-stakes decisions. As complex ML models (e.g., deep neural networks) are often considered black boxes, a wealth of procedures has been developed to shed light on their inner workings and the ways in which their predictions come about, defining the field of 'explainable AI' (XAI). Saliency methods rank input features according to some measure of 'importance'. Such methods are difficult to validate since a formal definition of feature importance is, thus far, lacking. It has been demonstrated that some saliency methods can highlight features that have no statistical association with the prediction target (suppressor variables). To avoid misinterpretations due to such behavior, we propose the actual presence of such an association as a necessary condition and objective preliminary definition for feature importance. We carefully crafted a ground-truth dataset in which all statistical dependencies are well-defined and linear, serving as a benchmark to study the problem of suppressor variables. We evaluate common explanation methods including LRP, DTD, PatternNet, PatternAttribution, LIME, Anchors, SHAP, and permutation-based methods with respect to our objective definition. We show that most of these methods are unable to distinguish important features from suppressors in this setting.
翻译:机器学习(ML)日益被越来越多地用来为高层决策提供信息。由于复杂的ML模型(如深神经网络)常常被视为黑盒,因此已经制定了大量程序来说明其内部运作和预测方式,界定了`可解释的AI'(XAI)领域。 " 合理方法 " 将输入特征按某种程度的`重要性'进行排序,这种方法难以验证,因为迄今缺乏对特征重要性的正式定义。已经证明一些突出的方法可以突出与预测目标没有统计联系的特点(压力变量)。为了避免由于这种行为而产生的误解,我们建议将这种联系的实际存在作为重要特征的一个必要条件和客观的初步定义。我们仔细设计了一个地面图解数据集,其中所有统计依赖性都非常明确和直线性,作为研究抑制变量问题的基准。我们评估了共同的解释方法,包括LRP、DTD、模式Net、模式归属性网络、LIME、AMPA、AMPA、AMERMA、AMPERMA、ANDRATIRATIENS、ASTRANSLA、ASTRANSLANSLA、ANDRANSLANDRANSLANSLANDRANSLANDRANSLAND