Transformer language models are state-of-the-art in a multitude of NLP tasks. Despite these successes, their opaqueness remains problematic. Recent methods aiming to provide interpretability and explainability to black-box models primarily focus on post-hoc explanations of (sometimes spurious) input-output correlations. Instead, we emphasize using prototype networks directly incorporated into the model architecture and hence explain the reasoning process behind the network's decisions. Moreover, while our architecture performs on par with several language models, it enables one to learn from user interactions. This not only offers a better understanding of language models but uses human capabilities to incorporate knowledge outside of the rigid range of purely data-driven approaches.
翻译:变换语言模型是众多NLP任务中最先进的。 尽管取得了这些成功,但它们的不透明性仍然存在问题。 最近旨在为黑箱模型提供可解释性和可解释性的方法主要侧重于(有时是虚假的)投入-产出相关性的热后解释。 相反,我们强调使用直接融入模型结构的原型网络,从而解释网络决定背后的推理过程。此外,虽然我们的架构与几种语言模型相同,但它使得人们能够从用户互动中学习。这不仅能更好地理解语言模型,而且利用人的能力将知识纳入纯数据驱动方法的僵硬范围之外。