Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and queries. As these vector-based systems rely on contextual language models, their usage commonly requires GPUs, which can be expensive and difficult to manage. Given recent advances in introducing sparsity into language models for improved inference efficiency, in this paper, we study how sparse language models can be used for dense retrieval to improve inference efficiency. Using the popular retrieval library Tevatron and the MSMARCO, NQ, and TriviaQA datasets, we find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds
翻译:向量检索系统已成为学术和工业搜索应用的常用工具,因为它们提供了一种简单且可扩展的方式,利用上下文表示来扩展搜索范围。由于这些基于向量的系统依赖于上下文语言模型,所以通常需要使用显卡,这可能会很昂贵且难以管理。鉴于最近引入了稀疏性来提高语言模型的推断效率,本文研究了如何利用稀疏语言模型进行密集检索,以提高推断效率。使用流行的检索库Tevatron和MSMARCO、NQ和TriviaQA数据集,我们发现稀疏语言模型可以直接替换常用语言模型,几乎不会降低准确性,并且推断速度提高了多达4.3倍。