Recent advances in retrieval models based on learned sparse representations generated by transformers have led us to, once again, consider score-at-a-time query evaluation techniques for the top-k retrieval problem. Previous studies comparing document-at-a-time and score-at-a-time approaches have consistently found that the former approach yields lower mean query latency, although the latter approach has more predictable query latency. In our experiments with four different retrieval models that exploit representational learning with bags of words, we find that transformers generate "wacky weights" that appear to greatly reduce the opportunities for skipping and early exiting optimizations that lie at the core of standard document-at-a-time techniques. As a result, score-at-a-time approaches appear to be more competitive in terms of query evaluation latency than in previous studies. We find that, if an effectiveness loss of up to three percent can be tolerated, a score-at-a-time approach can yield substantial gains in mean query latency while at the same time dramatically reducing tail latency.
翻译:根据变压器生成的微小代表制的检索模型最近的进展使我们再次考虑对顶级检索问题进行实时评分技术。以前比较文件实时和时分方法的研究一直发现,前一种方法产生较低的中度查询延迟度,虽然后一种方法具有更可预测的查询延迟度。在利用用一袋文字进行代表学习的四种不同的检索模型的实验中,我们发现变压器产生“微重”似乎大大减少了标准文件实时技术核心部分的跳过和早期退出优化的机会。结果,在查询评价延迟度方面,计时方法似乎比以往研究中更具竞争性。我们发现,如果能够容忍高达3%的效益损失,那么,按时计分方法可以带来显著的中度调增益,同时大幅降低尾部的延时率。