Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word addition, deletion, and modification of sentences may cause flipped predictions. To alleviate this problem, we propose a novel Dual Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture fine-grained differences in sentence pairs. DABERT comprises (1) Dual Attention module, which measures soft word matches by introducing a new dual channel alignment mechanism to model affinity and difference attention. (2) Adaptive Fusion module, this module uses attention to learn the aggregation of difference and affinity features, and generates a vector describing the matching details of sentence pairs. We conduct extensive experiments on well-studied semantic matching and robustness test datasets, and the experimental results show the effectiveness of our proposed method.
翻译:基于Transformer的预训练语言模型,如BERT,在语义句对匹配方面取得了显着的结果。然而,现有模型仍然存在捕捉细微差异的能力不足。像单词添加、删除以及修改等最小噪声可能导致预测错误。为了缓解这个问题,我们提出了一种新颖的双重注意力增强BERT(DABERT)来增强BERT捕捉句子对的细微差异的能力。DABERT包括(1)双重注意力模块,通过引入新的双通道对齐机制来模拟亲和和差异注意力,测量软单词匹配。(2)自适应融合模块,此模块使用注意力学习差异和亲和特征的聚合,并生成描述句子对匹配细节的向量。我们对广泛研究的语义匹配和鲁棒性测试数据集进行了大量实验,实验结果显示了我们提出的方法的有效性。