We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains. At the core, our design relies on pretrained sequence-to-sequence models within a standard multi-stage ranking architecture. "Expando" refers to the use of document expansion techniques to enrich keyword representations of texts prior to inverted indexing. "Mono" and "Duo" refer to components in a reranking pipeline based on a pointwise model and a pairwise model that rerank initial candidates retrieved using keyword search. We present experimental results from the MS MARCO passage and document ranking tasks, the TREC 2020 Deep Learning Track, and the TREC-COVID challenge that validate our design. In all these tasks, we achieve effectiveness that is at or near the state of the art, in some cases using a zero-shot approach that does not exploit any training data from the target task. To support replicability, implementations of our design pattern are open-sourced in the Pyserini IR toolkit and PyGaggle neural reranking library.
翻译:我们建议了一种处理文本排序问题的设计模式,称为“Expando-Mono-Duo”,这个模式已经对不同领域的若干临时检索任务进行了经验性验证。在核心方面,我们的设计依赖于标准多阶段排名结构中经过预先训练的顺序到顺序模型。“Expando”是指在倒置索引之前使用文件扩展技术来丰富文本的关键词表达方式。“Mono”和“Duo”是指基于一个点向模型和一种双向模型的重新排序管道中的组件,该模型对使用关键词搜索检索的初始候选人进行重新排序。我们介绍了MS MARCO通道和文件排序任务、TREC 2020深层学习轨道和TREC-COVID挑战的实验结果,以验证我们的设计。在所有这些任务中,我们都实现了处于或接近于艺术状态的效能,在某些情况下使用了不利用目标任务中的任何培训数据。为了支持复制,我们设计模式的实施是在Pyserini IR 工具包和 PyGagle 内层图书馆中公开提供的。