Complex machine learning algorithms are used more and more often in critical tasks involving text data, leading to the development of interpretability methods. Among local methods, two families have emerged: those computing importance scores for each feature and those extracting simple logical rules. In this paper we show that using different methods can lead to unexpectedly different explanations, even when applied to simple models for which we would expect qualitative coincidence. To quantify this effect, we propose a new approach to compare explanations produced by different methods.
翻译:复杂的机器学习算法在涉及文本数据的关键任务中越来越频繁地被使用,从而导致可解释方法的发展。在本地方法中,出现了两种方法:计算每个特性的重要性分数的方法和提取简单逻辑规则的方法。在本文中,我们表明,使用不同方法可能导致出乎意料的不同解释,即使应用到我们期望质量巧合的简单模型时也是如此。为了量化这一效果,我们提出了一个新的方法来比较不同方法产生的解释。