Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of document phrases that can drastically speed up open-domain QA and also allows us to reach long-tail targets. In particular, our dense-sparse phrase encoding effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents. Leveraging optimization strategies, our model can be trained in a single 4-GPU server and serve entire Wikipedia (up to 60 billion phrases) under 2TB with CPUs only. Our experiments on SQuAD-Open show that our model is more accurate than DrQA (Chen et al., 2017) with 6000x reduced computational cost, which translates into at least 58x faster end-to-end inference benchmark on CPUs.
翻译:现有的开放域解答( QA) 模式不适合实时使用, 因为它们需要按需要处理每个输入查询的多个长文件 。 在本文中, 我们引入了可以快速加速打开域域解答QA, 并使我们能够达到长尾目标的文档短语的可查不可知索引性表达式。 特别是, 我们的粗略的词组编码有效地捕捉了这些词组的合成、 语义和词汇信息, 并消除了上下文文档的管道过滤。 优化策略, 我们的模型可以在单一个 4- GPU 服务器中接受培训, 并且只用 CPU 向整个 维基百科( 高达600亿个词组) 提供 。 我们在 SQuAD- Open 上的实验显示, 我们的模型比 DQA ( Chen et al., 2017) 更精确, 计算成本降低了 6000x, 将计算成本降低到至少58x 更快的终端至 。