Dual-Encoders is a promising mechanism for answer retrieval in question answering (QA) systems. Currently most conventional Dual-Encoders learn the semantic representations of questions and answers merely through matching score. Researchers proposed to introduce the QA interaction features in scoring function but at the cost of low efficiency in inference stage. To keep independent encoding of questions and answers during inference stage, variational auto-encoder is further introduced to reconstruct answers (questions) from question (answer) embeddings as an auxiliary task to enhance QA interaction in representation learning in training stage. However, the needs of text generation and answer retrieval are different, which leads to hardness in training. In this work, we propose a framework to enhance the Dual-Encoders model with question answer cross-embeddings and a novel Geometry Alignment Mechanism (GAM) to align the geometry of embeddings from Dual-Encoders with that from Cross-Encoders. Extensive experimental results show that our framework significantly improves Dual-Encoders model and outperforms the state-of-the-art method on multiple answer retrieval datasets.
翻译:双重编码器是回答答题(QA)系统的一个很有希望的答案检索机制。目前,大多数传统的双重编码器仅仅通过匹配分数来学习问答的语义表达。研究人员提议在评分功能中采用QA互动功能,但以低效率推论阶段为代价。为了在推论阶段对问答进行独立编码,进一步引入了变式自动编码器,以重建从问答(回答)嵌入的答案(问题),作为在培训阶段加强质量A互动的辅助任务。然而,文本生成和答案检索的需要是不同的,这导致培训的难度。在这项工作中,我们提议了一个框架,用问答交叉组合和新的几何调整机制来加强双重编码器模型,使双重编码器的嵌入与跨编码器的嵌入相匹配。广泛的实验结果表明,我们的框架大大改进了二元编码器模型,并超越了多个答案检索数据集的状态方法。