PAQ:6 500万可能提出的问题和你能用它们做什么 (PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them)

Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5%, but trail RePAQ by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be configured for size (under 500MB) or speed (over 1K questions per second) whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to ``back-off" to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.

翻译：开放的问答解答模型直接利用问答(QA)配对(QA)的精确度,例如封闭本的QA(CBQA)模型和QA-pair检索器等。与从文本公司检索和阅读的常规模型相比,在速度和记忆方面表现出希望。QA-pair检索器也提供可解释的答案,一种高度的控制,在测试时以新的知识来更新是微不足道的。然而,这些模型缺乏检索和读取系统的准确性,因为相对于像维基百科这样的文本公司,现有的QA-pa(CBA)模型所覆盖的知识要少得多。为了改进QA-pa(PA)模型,我们介绍可能问到的QA-pa(PA)比常规Q(QA)要快得多。我们用PA-PA(PA)的精确度模型比最近读取的QA(QBA)的准确度要快得多。我们用PA-ReQ(RA)的精确度模型比我们用直径的Q(RQA),我们用15级的基级的基调的基的基调的基底的Q(RQ)。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/