Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response. In this paper, we outline limitations of this approach when using complex BERT-based classifiers: The word-based sampling produces texts that are out-of-distribution for the classifier and further gives rise to a high-dimensional search space, which can't be sufficiently explored when time or computation power is limited. Both of these challenges can be addressed by using segments as elementary building blocks for NLP interpretability. As illustration, we show that the simple choice of sentences greatly improves on both of these challenges. As a consequence, the resulting explainer attains much better fidelity on a benchmark classification task.
翻译:目前的Black-Box NLP可解释性方法,如LIME或SHAP, 其依据是修改案文,通过删除文字和模拟Black-Box的响应来解释。在本文中,我们概述了使用复杂的BERT分类方法时这一方法的局限性:基于字的抽样生成的文本对分类者来说超出了分配范围,进而产生了一个高维搜索空间,当时间或计算能力受到限制时,无法充分探索。这两个挑战都可以通过使用部分作为NLP可解释性的基本构件来解决。例如,我们表明简单的句子选择大大改进了这两个挑战。结果就是,由此得出的解释器在基准分类任务上实现了更好的忠诚。