This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.
翻译:本文建议解决使用维基百科作为独特知识来源的开放域解答问题:任何事实问题的答案是维基百科文章中的文本跨度。 机器大规模阅读的任务将文件检索(调查相关条款)的挑战与对文本的机读(确定答案的跨度)的挑战结合起来。 我们的方法将基于重光散射的搜索组件和TF-IDF与经过培训以探测维基百科段落中答案的多层经常性神经网络模型相匹配。 我们在多个现有QA数据集上的实验表明:(1) 两个模块对于现有的对口单位具有高度竞争力,(2) 利用对组合的远程监督进行多任务学习是这一具有挑战性的任务的有效完整系统。