This paper introduces a new framework for open-domain question answering in which the retriever and the reader iteratively interact with each other. The framework is agnostic to the architecture of the machine reading model, only requiring access to the token-level hidden representations of the reader. The retriever uses fast nearest neighbor search to scale to corpora containing millions of paragraphs. A gated recurrent unit updates the query at each step conditioned on the state of the reader and the reformulated query is used to re-rank the paragraphs by the retriever. We conduct analysis and show that iterative interaction helps in retrieving informative paragraphs from the corpus. Finally, we show that our multi-step-reasoning framework brings consistent improvement when applied to two widely used reader architectures DrQA and BiDAF on various large open-domain datasets --- TriviaQA-unfiltered, QuasarT, SearchQA, and SQuAD-Open.
翻译:本文介绍一个开放式问题解答新框架, 检索器和阅读器在其中相互迭接。 框架对机器阅读模式的结构结构是不可知的, 只需要访问读者的象征性隐藏表达式。 检索器使用快速近邻搜索来缩放含有数百万段落的 Corpora 。 一个闭门的经常性单元在每一步更新查询时都以读者的状况为条件, 重新配置的查询用检索器重新排序段落。 我们进行分析, 并显示迭代互动有助于从文体中检索信息化段落 。 最后, 我们显示, 我们的多步骤理由框架在应用于两种广泛使用的阅读器结构 DrQA 和 BiDAF 时, 带来一致的改进 -- 各种大型开放域数据集 -- TriviaQA- unfiltered、 Quasarat、 SearchQA 和 SQUAD- Open。