Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse Cloze Task and masked salient spans, followed by supervised finetuning using question-context pairs. This approach leads to absolute gains of 2+ points over the previous best result in the top-20 retrieval accuracy on Natural Questions and TriviaQA datasets. We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models. In the first approach, the reader considers each retrieved document separately while in the second approach, the reader considers all the retrieved documents together. Our experiments demonstrate the effectiveness of these approaches as we obtain new state-of-the-art results. On the Natural Questions dataset, we obtain a top-20 retrieval accuracy of 84, an improvement of 5 points over the recent DPR model. In addition, we achieve good results on answer extraction, outperforming recent models like REALM and RAG by 3+ points. We further scale up end-to-end training to large models and show consistent gains in performance over smaller models.


翻译:最近对开放式答题神经检索器(OpenQA)的培训工作采用了受监管和不受监管的方法,然而,仍然不清楚如何能最有效地将不受监管和受监管的方法用于神经检索器。在这项工作中,我们系统地研究检索器预培训。我们首先建议采用未经监管的预培训方法,与反克隆任务和隐藏突出范围相结合,然后通过使用问答对调进行监管的微调。这一方法导致在前一个最佳结果中,在自然问题和TriviaQA数据集前20级检索精确度上,获得2+点的绝对增益。我们还探索了两种方法,即对OpenQA模型的读者和检索器组件进行端对端对端监督的培训。在第一个办法中,读者分别审议每份检索文件,而在第二个办法中,读者将对所有检索的文件进行分别审议。我们的实验表明,当我们获得新的最新技术成果时,这些方法的有效性。在自然问题数据集中,我们获得了84级前20级检索准确度的顶级准确度,对最近3级的升级模型进行了改进,如在最新AG-D-D-S-S-D-S-S-S-S-S-S-S-A-A-A-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-C-C-C-C-B-C-C-B-B-B-B-B-B-C-C-C-C-C-C-C-C-C-C-B-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-M-C-C-C-M-S-S-C-S-C-C-C-C-C-C-S-MA-MA-MA-MA-MA-S-MA-S-S-MA-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-MA-MA-MA-MA-S-A-A-MA-S-T-MA-MA-

1
下载
关闭预览

相关内容

【文本匹配】Question Answering论文
深度学习自然语言处理
8+阅读 · 2020年4月20日
ICLR2019最佳论文出炉
专知
12+阅读 · 2019年5月6日
论文浅尝 | Question Answering over Freebase
开放知识图谱
18+阅读 · 2018年1月9日
漫谈机器阅读理解之Facebook提出的DrQA系统
深度学习每日摘要
18+阅读 · 2017年11月19日
揭开知识库问答KB-QA的面纱3·向量建模篇
PaperWeekly
8+阅读 · 2017年8月23日
Incremental Reading for Question Answering
Arxiv
5+阅读 · 2019年1月15日
Learning to Focus when Ranking Answers
Arxiv
5+阅读 · 2018年8月8日
Arxiv
5+阅读 · 2018年3月16日
VIP会员
相关VIP内容
Top
微信扫码咨询专知VIP会员