With the booming of pre-trained transformers, representation-based models based on Siamese transformer encoders have become mainstream techniques for efficient text matching. However, these models suffer from severe performance degradation due to the lack of interaction between the text pair, compared with interaction-based models. Prior arts attempt to address this through performing extra interaction for Siamese encoded representations, while the interaction during encoding is still ignored. To remedy this, we propose a \textit{Virtual} InteRacTion mechanism (VIRT) to transfer interactive knowledge from interaction-based models into Siamese encoders through attention map distillation. As a train-time-only component, VIRT could completely maintain the high efficiency of the Siamese structure and brings no extra computation cost during inference. To fully utilize the learned interactive knowledge, we further design a VIRT-adapted interaction strategy. Experimental results on multiple text matching datasets demonstrate that our method outperforms state-of-the-art representation-based models. What's more, VIRT can be easily integrated into existing representation-based methods to achieve further improvements.
翻译:随着预先培训的变压器的兴起,基于Siamse变压器变压器的代议制模型已成为高效文本匹配的主流技术。然而,这些模型由于文本对子之间缺乏互动,与基于互动的模式相比,性能严重退化。先行艺术尝试通过对Siamse编码显示器进行额外互动来解决这一问题,而编码过程中的互动仍然被忽视。为了纠正这一点,我们提议了一种基于Siamse变压器的代议器(VIRT)机制,通过关注地蒸馏将互动模型的交互知识传输到Siamese编码器中。作为只培训时间的组件,VIRT可以完全保持Siase结构的高效,并且在推断过程中不产生额外的计算成本。为充分利用所学的交互知识,我们进一步设计了一个经过ViRT调适的交互战略。多文本匹配数据集的实验结果表明,我们的方法超越了基于代表性的状态模式。更为重要的是,VIRT可以很容易地融入基于现有代表制的方法,从而实现进一步的改进。