Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR) -- a state-of-the-art (SOTA) open domain neural retrieval model -- on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domain shift, we explore its fine-tuning with synthetic training examples, which we generate from unlabeled target domain text using a text-to-text generator. In our experiments, this noisy but fully automated target domain supervision gives DPR a sizable advantage over BM25 in out-of-domain settings, making it a more viable model in practice. Finally, an ensemble of BM25 and our improved DPR model yields the best results, further pushing the SOTA for open retrieval QA on multiple out-of-domain test sets.
翻译:在开放式检索问题解答中,神经通道检索是一种有希望的新办法。在这项工作中,我们强调在COVID-19等封闭和专门目标领域测试高端通路检索(DPR) -- -- 最先进的开放域域神经检索模型,发现在这一重要的现实世界环境中,它落后于标准的BM25。为了使DPR在网域转换中更加强大,我们探索了它与合成培训范例的微调,我们利用文本到文字生成器从未标记的目标域文本中生成。在我们的实验中,这种吵闹但完全自动化的目标域域监管在外部环境环境中给DPR一个比BM25更大的优势,使其在实践上成为一个更加可行的模型。最后,一个集成的BM25和我们改进的DPR模型产生了最佳结果,进一步将SOTA用于在多个场外测试装置上公开检索QA。