通过审查关注地图的地形图作出可接受性判决 (Acceptability Judgements via Examining the Topology of Attention Maps)

Daniil Cherniavskii,Eduard Tulchinskii,Vladislav Mikhailov,Irina Proskurina,Laida Kushnareva,Ekaterina Artemova,Serguei Barannikov,Irina Piontkovskaya,Dmitri Piontkovski,Evgeny Burnaev

The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by $8$%-$24$% on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the foundation for analyzing the linguistic functions of attention heads and interpreting the correspondence between the graph features and grammatical phenomena.

翻译：语言知识编码的注意机制在语言知识方面所起的作用引起了全国语言规划的特别兴趣。然而,人们关注的负责人判断一项判决在语法上的可接受性的能力没有得到充分探讨。本文件用地籍数据分析(TDA)来对待可接受性判断的范式,表明注意图的几何特性可以有效地用于语言学方面的两种标准做法:二元判断和语言最低配对。地形特征使基于BERT的可接受分类器分数在三种语言(英语、意大利语和瑞典语)上提高了8美元至24美元。通过揭示最小对子的注意地图之间的表层差异,我们实现了BLIMP基准的人类水平业绩,超过了9个统计和变形LM基线。与此同时,TDA为分析注意头的语言功能和解释图表特征与语法现象之间的对应关系提供了基础。

相关内容

注意力机制

关注 0

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日