The retriever-reader pipeline has shown promising performance in open-domain QA but suffers from a very slow inference speed. Recently proposed question retrieval models tackle this problem by indexing question-answer pairs and searching for similar questions. These models have shown a significant increase in inference speed, but at the cost of lower QA performance compared to the retriever-reader models. This paper proposes a two-step question retrieval model, SQuID (Sequential Question-Indexed Dense retrieval) and distant supervision for training. SQuID uses two bi-encoders for question retrieval. The first-step retriever selects top-k similar questions, and the second-step retriever finds the most similar question from the top-k questions. We evaluate the performance and the computational efficiency of SQuID. The results show that SQuID significantly increases the performance of existing question retrieval models with a negligible loss on inference speed.
翻译:检索器阅读器管道在开放域域 QA 中表现良好,但速度非常慢。 最近提出的问题检索模型通过对答题配索引和搜索类似问题来解决这个问题。 这些模型显示,推断速度显著提高,但代价是QA的性能比检索器阅读器模型低。 本文提出了一个两步问题检索模型, SQuID( 序列问题强化检索) 和远程培训监督。 SQuID 使用两个双编码器进行问题检索。 第一步检索器选择了顶级类似问题, 第二步检索器从顶级问题中找到了最相似的问题。 我们评估了SQuID的性能和计算效率。 结果表明, SQuID 大大提高了现有问题检索模型的性能,在推断速度上损失微乎其微。