We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction tuning, which can follow human-written instructions to find the best documents for a given query. We introduce the first large-scale collection of approximately 40 retrieval datasets with instructions, BERRI, and present TART, a multi-task retrieval system trained on BERRI with instructions. TART shows strong capabilities to adapt to a new retrieval task via instructions and advances the state of the art on two zero-shot retrieval benchmarks, BEIR and LOTTE, outperforming models up to three times larger. We further introduce a new evaluation setup, X^2-Retrieval to better reflect real-world scenarios, where diverse domains and tasks are pooled and a system needs to find documents aligning users' intents. In this setup, TART significantly outperforms competitive baselines, further demonstrating the effectiveness of guiding retrieval with instructions.
翻译:我们研究用指示检索的问题,检索系统的用户在指示中明确描述其意图和查询;我们的目标是利用多任务指示调调,开发一个通用任务感检索系统,该系统可以遵循人写的指示,为特定查询寻找最佳文件;我们首次大规模收集大约40个检索数据集,并附有指示,BERRI, 并介绍一个多任务检索系统,这是一个在BERRI上培训的多任务检索系统,通过指示显示适应新的检索任务的强大能力,并通过两个零点检索基准(BEIR和LOTTE)提高最新水平,比性能模型(BEIR和LOTTE)高三倍;我们进一步推出一个新的评价设置,X ⁇ 2-Reearval,以更好地反映现实世界情景,其中将不同的领域和任务集中在一起,并需要系统找到文件,使用户的意图一致。在这一设置中,TART显著地改进竞争性基线,进一步展示指导检索与指示的实效。