With the success of pre-trained language models in recent years, more and more researchers focus on opening the "black box" of these models. Following this interest, we carry out a qualitative and quantitative analysis of constituency grammar in attention heads of BERT and RoBERTa. We employ the syntactic distance method to extract implicit constituency grammar from the attention weights of each head. Our results show that there exist heads that can induce some grammar types much better than baselines, suggesting that some heads act as a proxy for constituency grammar. We also analyze how attention heads' constituency grammar inducing (CGI) ability changes after fine-tuning with two kinds of tasks, including sentence meaning similarity (SMS) tasks and natural language inference (NLI) tasks. Our results suggest that SMS tasks decrease the average CGI ability of upper layers, while NLI tasks increase it. Lastly, we investigate the connections between CGI ability and natural language understanding ability on QQP and MNLI tasks.
翻译:随着近年来经过培训的语文模式的成功,越来越多的研究人员注重于打开这些模式的“黑盒”。根据这种兴趣,我们对BERT和ROBERTA的负责人关注的选区语法进行了定性和定量分析。我们采用了合成距离方法,从每个头部的注意力重量中提取隐含的选区语法。我们的结果表明,有些语法类型比基线更能诱发某些语法类型,表明有些领导人充当了选区语法的代言人。我们还分析了在对两种任务进行微调后,包括判决含义相似的任务和自然语言推断任务,关注对象语法能力的变化。我们的结果表明,SMS的任务会降低上层的平均语法语言法能力,而国家语言法则会增加这种能力。最后,我们调查了CGI能力和自然语言理解能力在 %P 和 MNLI 任务上的联系。