Contrastive explanation methods go beyond transparency and address the contrastive aspect of explanations. Such explanations are emerging as an attractive option to provide actionable change to scenarios adversely impacted by classifiers' decisions. However, their extension to textual data is under-explored and there is little investigation on their vulnerabilities and limitations. This work motivates textual counterfactuals by laying the ground for a novel evaluation scheme inspired by the faithfulness of explanations. Accordingly, we extend the computation of three metrics, proximity,connectedness and stability, to textual data and we benchmark two successful contrastive methods, POLYJUICE and MiCE, on our suggested metrics. Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models. More interestingly, the generated contrastive texts are more attainable with POLYJUICE which highlights the significance of latent representations in counterfactual search. Finally, we perform the first semantic adversarial attack on textual recourse methods. The results demonstrate the robustness of POLYJUICE and the role that latent input representations play in robustness and reliability.
翻译:解释方法超越了透明度,并解决了解释的对比性方面。这种解释正在成为一种有吸引力的选择,为分类者决定的不利影响情景提供可操作的改变。然而,对文字数据的扩展探索不足,对其脆弱性和局限性的调查也很少。这项工作通过为解释的忠实性所启发的新评价计划打下基础,激发了文字反事实。因此,我们将三个尺度的计算方法,即相近性、关联性和稳定性,扩大到文字数据,我们根据我们建议的指标,将两种成功的对比方法,即POLYJUICE和MICE作为基准。情绪分析数据的实验表明,两种模型中都没有明显看到反事实与其原始对应方的关联性。更有趣的是,所产生的对比性文本与POLYJUICE比较容易实现。最后,我们对文字追索方法进行了第一次语义性对立攻击。结果表明,POLYJUICE和潜在输入说明在稳健性和可靠性方面所起的作用。