We propose a novel method for applying Transformer models to extractive question answering (QA) tasks. Recently, pretrained generative sequence-to-sequence (seq2seq) models have achieved great success in question answering. Contributing to the success of these models are internal attention mechanisms such as cross-attention. We propose a simple strategy to obtain an extractive answer span from the generative model by leveraging the decoder cross-attention patterns. Viewing cross-attention as an architectural prior, we apply joint training to further improve QA performance. Empirical results show that on open-domain question answering datasets like NaturalQuestions and TriviaQA, our method approaches state-of-the-art performance on both generative and extractive inference, all while using much fewer parameters. Furthermore, this strategy allows us to perform hallucination-free inference while conferring significant improvements to the model's ability to rerank relevant passages.
翻译:我们提出了将变异模型应用到采掘问题解答(QA)任务的新颖方法。 最近,经过预先训练的基因序列到序列(seq2seq)模型在解答中取得了巨大成功。 这些模型的成功是内部关注机制,例如交叉注意。 我们提出了一个简单的战略,通过利用解码器交叉注意模式获得来自基因模型的抽取答案。 将交叉注意视为建筑学,我们用联合培训来进一步改进QA的性能。 经验性结果显示,在诸如“自然问题”和“TriviaQA”等开放式问题解答数据集时,我们的方法在基因化和采掘推断方面都采用最新技术表现,同时使用更少的参数。此外,这一战略使我们能够进行无幻觉的推断,同时大大改进模型重新排列相关段落的能力。