Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the \textbf{D}ependency-Enhanced \textbf{A}daptive \textbf{F}usion \textbf{A}ttention (\textbf{DAFA}), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, \textbf{\emph{(i)}} DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task.
翻译:基于Transformer的预训练模型,如BERT,已经在语义句子匹配方面取得了巨大的进展。同时,多个自然语言处理任务中都表明,依存先验知识也具有普遍的好处。然而,如何有效地将依存先验结构整合到预训练模型中,以更好地建模复杂的语义匹配关系尚未确定。在本文中,我们提出了依存增强自适应融合(DAFA)方法,它将依存结构显式地引入到预训练模型中,并自适应地融合语义信息。具体而言,DAFA首先提出了一种结构敏感的范例,以构建一个依存矩阵来校准注意权重。它采用了一个自适应融合模块,将得到的依赖信息和原始的语义信号进行融合。此外,DAFA重构了注意力计算的流程,并提供了更好的解释性。通过将其应用于BERT,我们的方法在10个公共数据集上实现了最先进的或有竞争力的性能,证明了在语义匹配任务中自适应融合依存结构的优势。