Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5%, but trail RePAQ by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be configured for size (under 500MB) or speed (over 1K questions per second) whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to ``back-off" to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.
翻译:开放的问答解答模型直接利用问答(QA)配对(QA)的精确度,例如封闭本的QA(CBQA)模型和QA-pair检索器等。与从文本公司检索和阅读的常规模型相比,在速度和记忆方面表现出希望。QA-pair检索器也提供可解释的答案,一种高度的控制,在测试时以新的知识来更新是微不足道的。然而,这些模型缺乏检索和读取系统的准确性,因为相对于像维基百科这样的文本公司,现有的QA-pa(CBA)模型所覆盖的知识要少得多。为了改进QA-pa(PA)模型,我们介绍可能问到的QA-pa(PA)比常规Q(QA)要快得多。我们用PA-PA(PA)的精确度模型比最近读取的QA(QBA)的准确度要快得多。我们用PA-ReQ(RA)的精确度模型比我们用直径的Q(RQA),我们用15级的基级的基调的基的基调的基底的Q(RQ)。