Recent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in the first place. However, when using dense retrieval approaches that use multiple embedded representations for each query, a large number of documents can be retrieved for each query, hindering the efficiency of the method. Hence, this work is the first to consider efficiency improvements in the context of a dense retrieval approach (namely ColBERT), by pruning query term embeddings that are estimated not to be useful for retrieving relevant documents. Our proposed query embeddings pruning reduces the cost of the dense retrieval operation, as well as reducing the number of documents that are retrieved and hence require to be fully scored. Experiments conducted on the MSMARCO passage ranking corpus demonstrate that, when reducing the number of query embeddings used from 32 to 3 based on the collection frequency of the corresponding tokens, query embedding pruning results in no statistically significant differences in effectiveness, while reducing the number of documents retrieved by 70%. In terms of mean response time for the end-to-end to end system, this results in a 2.65x speedup.
翻译:密集检索技术的最近进展提供了一种前景,即不仅能够使用BERT等背景化语言模型重新整理文件,而且能够首先使用这些模型来鉴别收藏文件。然而,在使用对每个查询使用多个嵌入式表示器的密集检索方法时,每个查询都可检索大量文件,从而妨碍方法的效率。因此,这项工作首先考虑在密集检索方法(即ColBERT)中提高效率,为此,通过使用估计对检索相关文件没有用处的查询术语嵌入,进行剪切换,从而使用这些模型来鉴别原始文件。我们提议的查询嵌入程序将减少密集检索操作的费用,并减少检索的文件数量,从而需要完全得分。对MSMARCO的分级程序进行的实验表明,在根据相应标记的收集频率将查询嵌入数量从32个减少到3个时,查询嵌入的结果在统计上没有显著的差异,同时将检索文件的数量减少70%。在最终结果中的平均反应时间方面,这一最后结果为2。