This paper presents a pre-training technique called query-as-context that uses query prediction to improve dense retrieval. Previous research has applied query prediction to document expansion in order to alleviate the problem of lexical mismatch in sparse retrieval. However, query prediction has not yet been studied in the context of dense retrieval. Query-as-context pre-training assumes that the predicted query is a special context for the document and uses contrastive learning or contextual masked auto-encoding learning to compress the document and query into dense vectors. The technique is evaluated on large-scale passage retrieval benchmarks and shows considerable improvements compared to existing strong baselines such as coCondenser and CoT-MAE, demonstrating its effectiveness. Our code will be available at https://github.com/caskcsg/ir/tree/main/cotmae-qc .
翻译:本文介绍了一种培训前技术,称为“查询-文字”,使用查询预测来改进密集的检索; 以前的研究已经应用查询预测来记录扩展,以缓解少数检索中的词汇不匹配问题; 然而,尚未在密集检索的背景下研究查询预测; 查询-文字-文字-训练前的假设是,预测查询是文件的特殊背景,并使用对比性学习或背景蒙面自动编码学习,将文件和查询压缩到密集的矢量中; 该技术根据大型通道检索基准进行评估,显示与COCondenser和COT-MAE等现有强力基线相比有很大改进,显示其有效性; 我们的代码将在https://github.com/caskcsg/ir/tree/main/ctmae-qc上查阅。