Conversational search (CS) needs a holistic understanding of conversational inputs to retrieve relevant passages. In this paper, we demonstrate the existence of a retrieval shortcut in CS, which causes models to retrieve passages solely relying on partial history while disregarding the latest question. With in-depth analysis, we first show that naively trained dense retrievers heavily exploit the shortcut and hence perform poorly when asked to answer history-independent questions. To build more robust models against shortcut dependency, we explore various hard negative mining strategies. Experimental results show that training with the model-based hard negatives effectively mitigates the dependency on the shortcut, significantly improving dense retrievers on recent CS benchmarks. In particular, our retriever outperforms the previous state-of-the-art model by 11.0 in Recall@10 on QReCC.
翻译:谈话搜索( CS) 需要全面理解谈话输入以检索相关段落。 在本文中, 我们展示了 CS 中存在一个检索快捷键, 导致模型在忽略最新问题的同时完全依靠部分历史取回通道。 我们首先通过深入分析发现, 受过天真的训练的密集检索器大量使用快捷键, 因此在回答历史独立问题时表现不佳 。 为了建立更强有力的模式来抵制对快捷键的依赖, 我们探索了各种硬性负面采矿策略。 实验结果表明, 使用基于模型的硬性负作用的培训可以有效减轻对快捷键的依赖, 大大改善对最近 CS 基准的密度检索器。 特别是, 我们的检索器在回溯时间@ 10 QReCC 上, 超过11: 0, 在回溯时间@ 10 。