Explainability has become a central requirement for the development, deployment, and adoption of machine learning (ML) models and we are yet to understand what explanation methods can and cannot do. Several factors such as data, model prediction, hyperparameters used in training the model, and random initialization can all influence downstream explanations. While previous work empirically hinted that explanations (E) may have little relationship with the prediction (Y), there is a lack of conclusive study to quantify this relationship. Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we measure the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors (hyperparameters) (inputs to generate saliency-based Es or Ys). We discover that Y's relative direct influence on E follows an odd pattern; the influence is higher in the lowest-performing models than in mid-performing models, and it then decreases in the top-performing models. We believe our work is a promising first step towards providing better guidance for practitioners who can make more informed decisions in utilizing these explanations by knowing what factors are at play and how they relate to their end task.
翻译:解释性已成为开发、部署和采用机器学习(ML)模型的一项核心要求,我们尚未了解哪些解释方法可以而且不能做到。 数据、模型预测、用于模型培训的超参数、随机初始化等若干因素都能够影响下游解释。 虽然先前的工作从经验上暗示解释(E)可能与预测(Y)没有多大关系,但缺乏量化这种关系的结论性研究。我们的工作从因果推论中借用了工具,系统分析这种关系。更具体地说,我们衡量E和Y之间的关系的方法是测量在干预其因果祖先(Hyperaters)时的治疗效果(用于生成显著的Es或Ys)。我们发现,Y对E的相对直接影响力是一种奇特的模式;在最差的模型中效模型中,影响更大,然后是业绩最优的模型。我们认为,我们的工作是朝向提供更好的指导的方向迈出了有希望的第一步,即通过了解哪些因素在使用这些解释时能够作出更知情的决定。