State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper proposes a new training and inference paradigm for re-ranking. We propose to finetune a pretrained encoder-decoder model using in the form of document to query generation. Subsequently, we show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference. This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference. Our experiments show that this new paradigm achieves results that are comparable to the more expensive cross-attention ranking approaches while being up to 6.8X faster. We believe this work paves the way for more efficient neural rankers that leverage large pretrained models.
翻译:最先进的神经模型通常使用交叉偏好来进行重新排序的交叉偏好来编码文档解密配对。 为此, 模型一般使用只使用编码器( 如 BERT) 的范式或编码器解码器( 如 T5 ) 的方法。 但是, 这些范式并非没有缺陷, 也就是说, 运行所有查询文档配对的模型, 在推论发生时, 使用所有查询文档配对的模型都会产生巨大的计算成本 。 本文为重新排序提出了一个新的培训和推导模式 。 我们提议微调一种事先经过训练的编码器解码模型模型, 以文档生成为格式, 用于查询生成。 随后, 我们展示了这种编码解码器- 解码器结构可以在推论期间被拆解成一个只使用编码器的语言模型。 这导致大量的时间加速, 因为只使用解码器的架构只需要学会在推论期间解释静态编码嵌入。 我们的实验表明, 这个新范式取得的结果, 与更昂贵的交叉摄像器排序方法相近似于更高的交叉排列方法, 而我们相信, 正在更快地进行这种推算。