NIR-Prompt:多任务通用神经信息检索培训框架 (NIR-Prompt: A Multi-task Generalized Neural Information Retrieval Training Framework)

Information retrieval aims to find information that meets users' needs from the corpus. Different needs correspond to different IR tasks such as document retrieval, open-domain question answering, retrieval-based dialogue, etc., while they share the same schema to estimate the relationship between texts. It indicates that a good IR model can generalize to different tasks and domains. However, previous studies indicate that state-of-the-art neural information retrieval (NIR) models, e.g, pre-trained language models (PLMs) are hard to generalize. Mainly because the end-to-end fine-tuning paradigm makes the model overemphasize task-specific signals and domain biases but loses the ability to capture generalized essential signals. To address this problem, we propose a novel NIR training framework named NIR-Prompt for retrieval and reranking stages based on the idea of decoupling signal capturing and combination. NIR-Prompt exploits Essential Matching Module (EMM) to capture the essential matching signals and gets the description of tasks by Matching Description Module (MDM). The description is used as task-adaptation information to combine the essential matching signals to adapt to different tasks. Experiments under in-domain multi-task, out-of-domain multi-task, and new task adaptation settings show that NIR-Prompt can improve the generalization of PLMs in NIR for both retrieval and reranking stages compared with baselines.

翻译：信息检索旨在查找满足用户对本体需求的信息。不同的需要符合不同的IR任务,如文件检索、开放式问答、基于检索的对话等不同IR任务,而它们使用相同的模型来估计文本之间的关系。它表明良好的IR模式可以概括到不同的任务和领域。然而,以前的研究表明,最先进的神经信息检索模型,例如,预先培训的语言模型很难概括。主要因为端到端微调模式使模型过分强调特定任务的信号和领域偏差,但丧失捕捉普遍基本信号的能力。为了解决这一问题,我们提议了一个名为NIR-Prompt的新的NIR培训框架,用于根据分解信号捕捉和组合的概念进行检索和重新排列阶段。NIR-Prompt利用了基本匹配模块(EMM)来捕捉基本匹配信号,并通过匹配描述模块(MDMM)对任务进行描述。描述是作为任务调整模式的模型,用来在任务调整阶段里收集通用基本基本基本基本的基本信息,在NIR中进行实验性调整,在任务和多级调整任务下将基本信号与任务合并到新的任务。