The current trend of extractive question answering (QA) heavily relies on the joint encoding of the document and the question. In this paper, we formalize a new modular variant of extractive QA, Phrase-Indexed Question Answering (PI-QA), that enforces complete independence of the document encoder from the question by building the standalone representation of the document discourse, a key research goal in machine reading comprehension. That is, the document encoder generates an index vector for each answer candidate phrase in the document; at inference time, each question is mapped to the same vector space and the answer with the nearest index vector is obtained. The formulation also implies a significant scalability advantage since the index vectors can be pre-computed and hashed offline for efficient retrieval. We experiment with baseline models for the new task, which achieve a reasonable accuracy but significantly underperform unconstrained QA models. We invite the QA research community to engage in PI-QA for closing the gap.
翻译:抽取问题解答(QA)的当前趋势在很大程度上依赖于文档和问题的联合编码。在本文中,我们正式确定了一个新的模块变体“抽取 QA ” (PI-QA),该变体将文件编码器完全独立于问题,通过建立单独的文档话语表,这是机器阅读理解中的一个关键研究目标。这就是,文件编码器为文件中的每个回答候选词生成了一个指数矢量矢量矢量;在推论时间,每个问题都被映射到同一个矢量空间,并获得与最接近的指数矢量的答案。该配方还意味着一个显著的可扩展性优势,因为索引矢量可以预先计算,并且已经离线有效检索。我们试验了新任务的基线模型,这些模型达到合理的准确性,但明显低于不受约束的QA模式。我们请QA研究界参与PI-QA,以缩小差距。