Latency and efficiency issues are often overlooked when evaluating IR models based on Pretrained Language Models (PLMs) in reason of multiple hardware and software testing scenarios. Nevertheless, efficiency is an important part of such systems and should not be overlooked. In this paper, we focus on improving the efficiency of the SPLADE model since it has achieved state-of-the-art zero-shot performance and competitive results on TREC collections. SPLADE efficiency can be controlled via a regularization factor, but solely controlling this regularization has been shown to not be efficient enough. In order to reduce the latency gap between SPLADE and traditional retrieval systems, we propose several techniques including L1 regularization for queries, a separation of document/query encoders, a FLOPS-regularized middle-training, and the use of faster query encoders. Our benchmark demonstrates that we can drastically improve the efficiency of these models while increasing the performance metrics on in-domain data. To our knowledge, {we propose the first neural models that, under the same computing constraints, \textit{achieve similar latency (less than 4ms difference) as traditional BM25}, while having \textit{similar performance (less than 10\% MRR@10 reduction)} as the state-of-the-art single-stage neural rankers on in-domain data}.
翻译:由于多种硬件和软件测试方案,在评价基于预先培训语言模型的IR模型时,往往忽略了弹性和效率问题。然而,效率是这类系统的一个重要部分,不应忽视。在本文件中,我们侧重于提高苏人解模式的效率,因为苏人解模式在TREC收藏中取得了最先进的零射性能和竞争性结果。苏人解效率可以通过一个正规化因素加以控制,但仅仅控制这一正规化是不够的。为了缩小苏人解与传统检索系统之间的延迟差距,我们建议采用若干技术,包括查询L1正规化、文件/query编码器分离、FLOPS正规化的中级培训以及使用更快的查询编码器。我们的基准表明,我们可以大幅提高这些模型的效率,同时提高内部数据的业绩衡量标准。我们的知识显示,{我们提出了第一个神经模型,在相同的计算限制下,将苏人解与传统级别(不及4mblexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx