评价情感分析解释的可视性和信性 (On the Evaluation of the Plausibility and Faithfulness of Sentiment Analysis Explanations)

Current Explainable AI (ExAI) methods, especially in the NLP field, are conducted on various datasets by employing different metrics to evaluate several aspects. The lack of a common evaluation framework is hindering the progress tracking of such methods and their wider adoption. In this work, inspired by offline information retrieval, we propose different metrics and techniques to evaluate the explainability of SA models from two angles. First, we evaluate the strength of the extracted "rationales" in faithfully explaining the predicted outcome. Second, we measure the agreement between ExAI methods and human judgment on a homegrown dataset1 to reflect on the rationales plausibility. Our conducted experiments comprise four dimensions: (1) the underlying architectures of SA models, (2) the approach followed by the ExAI method, (3) the reasoning difficulty, and (4) the homogeneity of the ground-truth rationales. We empirically demonstrate that anchors explanations are more aligned with the human judgment and can be more confident in extracting supporting rationales. As can be foreseen, the reasoning complexity of sentiment is shown to thwart ExAI methods from extracting supporting evidence. Moreover, a remarkable discrepancy is discerned between the results of different explainability methods on the various architectures suggesting the need for consolidation to observe enhanced performance. Predominantly, transformers are shown to exhibit better explainability than convolutional and recurrent architectures. Our work paves the way towards designing more interpretable NLP models and enabling a common evaluation ground for their relative strengths and robustness.

翻译：目前可解释的AI(ExAI)方法,特别是在NLP字段中,是在各种数据集的基础上,通过使用不同的衡量标准来评估几个方面,在不同的数据集上进行当前的解释性AI(ExAI)方法(ExAI)方法,特别是在NLP字段中。由于缺乏一个共同的评价框架,妨碍了这些方法的跟踪和更广泛地采用。在这项工作中,在离线信息检索的启发下,我们提出了不同的衡量标准和技术,以便从两个角度评价SA模型的可解释性。首先,我们评估提取的“解释性”的强度,以忠实地解释预测结果。第二,我们衡量ExAI方法和人类对本土数据集的判断1 之间的协议,以反映出其原理的可辨别性。我们进行的实验包括四个方面:(1) SA模型的基本结构,(2) ExAI方法遵循的方法,(3) 推理困难,以及(4) 地面图理原理的同质性。我们从经验上证明,铺定的曲更符合人的判断性解释性,并且更有信心地推理。正如,人们可以预见,情绪的推理复杂性表明ExAI方法会妨碍ExAI方法从提取证据,取证据。此外,要更精确地解释我们的推理,需要更精确地解释我们更精确地解释各种推理,更精确地解释结构。