Neural rationale models are popular for interpretable predictions of NLP tasks. In these, a selector extracts segments of the input text, called rationales, and passes these segments to a classifier for prediction. Since the rationale is the only information accessible to the classifier, it is plausibly defined as the explanation. Is such a characterization unconditionally correct? In this paper, we argue to the contrary, with both philosophical perspectives and empirical evidence suggesting that rationale models are, perhaps, less rational and interpretable than expected. We call for more rigorous and comprehensive evaluations of these models to ensure desired properties of interpretability are indeed achieved. The code can be found at https://github.com/yimingz89/Neural-Rationale-Analysis.
翻译:神经理论模型对可解释 NLP 任务的预测很受欢迎。 在这些模型中,选取部分输入文本,称为理由,并将这些部分传递给分类者进行预测。由于理由是唯一的分类者可以获得的信息,因此可以将理由定义为解释。这种定性是否无条件正确?在本文中,我们提出相反的论点,哲学观点和经验证据表明,理论模型也许比预期的要低合理和可以解释。 我们呼吁对这些模型进行更加严格和全面的评估,以确保确实实现预期的解释性。该代码可以在https://github.com/yimingz89/Neural-Rationale-Anais解网站上找到。