Why do large pre-trained transformer-based models perform so well across a wide variety of NLP tasks? Recent research suggests the key may lie in multi-headed attention mechanism's ability to learn and represent linguistic information. Understanding how these models represent both syntactic and semantic knowledge is vital to investigate why they succeed and fail, what they have learned, and how they can improve. We present Dodrio, an open-source interactive visualization tool to help NLP researchers and practitioners analyze attention mechanisms in transformer-based models with linguistic knowledge. Dodrio tightly integrates an overview that summarizes the roles of different attention heads, and detailed views that help users compare attention weights with the syntactic structure and semantic information in the input text. To facilitate the visual comparison of attention weights and linguistic knowledge, Dodrio applies different graph visualization techniques to represent attention weights with longer input text. Case studies highlight how Dodrio provides insights into understanding the attention mechanism in transformer-based models. Dodrio is available at https://poloclub.github.io/dodrio/.
翻译:最近的研究表明关键可能在于多头关注机制学习和代表语言信息的能力。 了解这些模型如何代表合成和语义知识对于调查为什么它们成功和失败、它们学到了什么以及它们如何改进至关重要。 我们介绍了Dodrio,这是一个开放源的交互式可视化工具,用于帮助学习和展示基于语言知识的变压器模型的研究人员和从业人员分析关注机制。Dodrio严格整合了概述不同关注负责人作用的概览,以及有助于用户将注意力重量与输入文本中的合成结构和语义信息进行比较的详细观点。为了便于对注意重量和语言知识进行视觉比较,Dodrio应用了不同的图形化技术,以较长的输入文本来代表注意重量。案例研究强调Dodrio如何提供洞察力,以了解基于变压器模型的注意机制。 Dodrio可在https://poloclub.github.io/dodrio/上查阅。