There exist several methods that aim to address the crucial task of understanding the behaviour of AI/ML models. Arguably, the most popular among them are local explanations that focus on investigating model behaviour for individual instances. Several methods have been proposed for local analysis, but relatively lesser effort has gone into understanding if the explanations are robust and accurately reflect the behaviour of underlying models. In this work, we present a survey of the works that analysed the robustness of two classes of local explanations (feature importance and counterfactual explanations) that are popularly used in analysing AI/ML models in finance. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results. Finally, the survey introduces some pointers about extending current robustness analysis approaches so as to identify reliable explainability methods.
翻译:有几种方法旨在解决理解AI/ML模型行为这一关键任务,可以说,其中最受欢迎的是侧重于调查个别案例示范行为的当地解释; 提出了几种方法供当地分析,但如果解释是稳健的并准确反映基本模型的行为,则相对较少的努力已获得理解; 在这项工作中,我们对在分析AI/ML模型时普遍使用的两类当地解释(地物重要性和反事实解释)的稳健性进行了调查; 调查的目的是统一现有稳健性定义,采用分类法对不同的稳健性方法进行分类,并讨论一些有趣的结果; 最后,调查提出了一些关于扩大目前稳健性分析方法的要点,以便确定可靠的解释方法。