改善基于伪相关反馈的稠密检索中的查询表示：一项可重复性研究 (Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study)

Pseudo-Relevance Feedback (PRF) utilises the relevance signals from the top-k passages from the first round of retrieval to perform a second round of retrieval aiming to improve search effectiveness. A recent research direction has been the study and development of PRF methods for deep language models based rankers, and in particular in the context of dense retrievers. Dense retrievers, compared to more complex neural rankers, provide a trade-off between effectiveness, which is often reduced compared to more complex neural rankers, and query latency, which also is reduced making the retrieval pipeline more efficient. The introduction of PRF methods for dense retrievers has been motivated as an attempt to further improve their effectiveness. In this paper, we reproduce and study a recent method for PRF with dense retrievers, called ANCE-PRF. This method concatenates the query text and that of the top-k feedback passages to form a new query input, which is then encoded into a dense representation using a newly trained query encoder based on the original dense retriever used for the first round of retrieval. While the method can potentially be applied to any of the existing dense retrievers, prior work has studied it only in the context of the ANCE dense retriever. We study the reproducibility of ANCE-PRF in terms of both its training (encoding of the PRF signal) and inference (ranking) steps. We further extend the empirical analysis provided in the original work to investigate the effect of the hyper-parameters that govern the training process and the robustness of the method across these different settings. Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.

翻译：伪相关反馈（PRF）利用第一轮检索中从前K个段落中得到的相关信号来执行第二轮检索，旨在提高搜索效果。最近的一个研究方向是针对基于深度语言模型排名器的PRF方法的研究和开发，特别是在稠密检索器的上下文中。与更复杂的神经排序器相比，稠密检索器在效果上的表现有所降低，但是在查询延迟方面更低，使检索管道更加高效。PRF方法的引入旨在尝试进一步提高其效果。在本文中，我们重现和研究了一种针对稠密检索器的PRF方法，称为ANCE-PRF。该方法将查询文本和前K个反馈段落的文本连接起来形成一个新的查询输入，然后使用基于用于第一轮检索的原始稠密检索器的新训练的查询编码器将其编码为稠密的表示形式。虽然该方法潜在地可应用于任何现有的稠密检索器，但先前的研究仅在ANCE稠密检索器的上下文中研究了该方法。我们研究了ANCE-PRF的可重复性，包括其训练（PRF信号的编码）和推断（排名）步骤。我们进一步扩展了原始工作提供的经验分析，以研究管控培训过程和方法的敏感性的超参数的影响。最后，我们为使用不同于ANCE的稠密检索器进行第一轮检索和编码PRF信号的情况下，研究了ANCE-PRF方法的通用性。