Evaluating explanation techniques using human subjects is costly, time-consuming and can lead to subjectivity in the assessments. To evaluate the accuracy of local explanations, we require access to the true feature importance scores for a given instance. However, the prediction function of a model usually does not decompose into linear additive terms that indicate how much a feature contributes to the output. In this work, we suggest to instead focus on the log odds ratio (LOR) of the prediction function, which naturally decomposes into additive terms for logistic regression and naive Bayes. We demonstrate how we can benchmark different explanation techniques in terms of their similarity to the LOR scores based on our proposed approach. In the experiments, we compare prominent local explanation techniques and find that the performance of the techniques can depend on the underlying model, the dataset, which data point is explained, the normalization of the data and the similarity metric.
翻译:使用人类实验品来评价解释技术是昂贵的,耗时的,并可能导致评估的主观性。为了评价当地解释的准确性,我们需要获得某个特定例子的真正特征重要分数。然而,模型的预测功能通常不会分解成线性添加术语,以表明某一特征对产出的贡献程度。在这项工作中,我们建议把重点放在预测功能的日志概率比(LOR)上,这自然会分解成物流回归和天真的贝耶斯的添加术语。我们证明我们如何能够根据我们提议的方法,将不同解释技术与LOR分数的相似性作为基准。在实验中,我们比较了突出的地方解释技术,发现这些技术的性能取决于基本模型、数据集、数据点的解释、数据的正常化和类似性衡量标准。