关注能反射同步结构( 如果您愿意) (Attention Can Reflect Syntactic Structure (If You Let It))

Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English -- a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.

翻译：自将变形器普及为全国语言方案通用特征编码器以来,许多研究试图将语言结构从新颖的多头关注机制中解码出来,然而,许多这类工作几乎完全集中于英语 -- -- 一种文字顺序僵硬,缺乏动态形态学的语言。在本研究中,我们提出18种语言的多语种BERT解码实验,以检验依赖性语法在关注模式中反映的主张的一般可接受性。我们表明,从单一注意力主管的基线精度上看,可以对整棵树进行解码,而个别关系往往由各语言的同一负责人跟踪。此外,为了处理最近关于关注状况的辩论,作为一种解释机制,我们尝试在冻结不同参数系列的同时,微调MBERT在受监督的区分目标上进行试验。有趣的是,在引导了解明确语言结构的目标时,我们发现在由此形成的关注模式中代表的同样结构,在哪些参数被冻结方面存在着有趣的差异。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

专知会员服务

28+阅读 · 2020年10月26日

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

107+阅读 · 2020年8月30日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日