Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder. Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker. Furthermore, we find that simple heuristic Hybrid Retrieval Environments (HRE) can improve baseline performance by several nDCG points. The search agent based on HRE (HARE) matches state-of-the-art performance, balanced in both zero-shot and in-domain evaluations, via interpretable actions, and at twice the speed.
翻译:学习搜索是建立人工智能代理程序,使其能够自主使用搜索框查找信息的任务。迄今为止,已经证明当前的语言模型可以学习符号化查询重述策略,与传统的基于术语的检索结合使用,但不能超越神经信息检索模型。我们将“学习搜索”的先前设置扩展到混合环境中,该混合环境接受离散查询细化操作,在双编码器的第一次检索步骤之后进行。在BEIR任务上的实验表明,通过行为克隆训练的搜索代理优于基于双编码器信息检索和交叉编码器重排的底层搜索系统。此外,我们发现简单的启发式混合检索环境可以提高基线性能多个nDCG点。基于HRE的搜索代理(HARE)通过可解释的操作,在零样本和域内评估中均匀匹配最先进的性能,并且速度可达两倍。