Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the \textbf{D}ependency-Enhanced \textbf{A}daptive \textbf{F}usion \textbf{A}ttention (\textbf{DAFA}), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, \textbf{\emph{(i)}} DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task.
翻译:在语义匹配方面, BERT 等基于预先培训的变异模型在语义匹配方面取得了巨大进展。 同时, 依赖前知识也显示在多个 NLP 任务中的一般好处。 但是, 如何有效地将依赖性前结构整合到经过培训的模型中, 以更好地建模复杂的语义匹配关系模型中, 仍然未解决 。 在本文中, 我们提议了\ textbf{D}D}Empendency- Entlebf{A} a}ditive divisive\ textbf{Fus}Fusion\ textbf{A} attention (\ textbf{DAFA}) 。 该模型将依赖性结构明确引入预先培训的模型, 并适应性地将其与语义信息连接起来。 具体而言,,\ textbfffflentf- femph{(i)\ DAFAFA} 首先提出一个结构敏感的模型, 以构建一个调整关注性矩阵和原始语义信号信号信号的适应性模块。 此外, DAFAFAFAFAFAFA在BERT 10 中, 通过应用它, 我们的方法在调整适应性任务结构中实现了适应性功能上展示了十项。