The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by $8$%-$24$% on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the foundation for analyzing the linguistic functions of attention heads and interpreting the correspondence between the graph features and grammatical phenomena.
翻译:语言知识编码的注意机制在语言知识方面所起的作用引起了全国语言规划的特别兴趣。然而,人们关注的负责人判断一项判决在语法上的可接受性的能力没有得到充分探讨。本文件用地籍数据分析(TDA)来对待可接受性判断的范式,表明注意图的几何特性可以有效地用于语言学方面的两种标准做法:二元判断和语言最低配对。地形特征使基于BERT的可接受分类器分数在三种语言(英语、意大利语和瑞典语)上提高了8美元至24美元。通过揭示最小对子的注意地图之间的表层差异,我们实现了BLIMP基准的人类水平业绩,超过了9个统计和变形LM基线。与此同时,TDA为分析注意头的语言功能和解释图表特征与语法现象之间的对应关系提供了基础。