Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.
翻译:经过培训的以变异器为基础的神经语言模型,如BERT,在NLP任务种类方面取得了显著成果。最近的工作表明,关注型模型可以受益于对当地区域更加集中的关注。其中多数可以限制线性范围内的注意力范围,或者局限于某些任务,如机器翻译和答题等。在本文件中,我们建议对当地进行综合注意,根据合成结构的距离限制关注范围。拟议的对本地的注意,可以与预先培训的语言模型相结合,如BERT, 使该模型侧重于综合相关词汇。我们就各种单词基准进行了实验,包括判决分类和顺序标签任务。实验结果显示,在所有基准数据集方面,BERT取得了一致的收益。广泛的研究证实,由于更加集中关注与合成相关词汇,我们的模式取得了更好的业绩。