Dense retrieval (DR) has the potential to resolve the query understanding challenge in conversational search by matching in the learned embedding space. However, this adaptation is challenging due to DR models' extra needs for supervision signals and the long-tail nature of conversational search. In this paper, we present a Conversational Dense Retrieval system, ConvDR, that learns contextualized embeddings for multi-turn conversational queries and retrieves documents solely using embedding dot products. In addition, we grant ConvDR few-shot ability using a teacher-student framework, where we employ an ad hoc dense retriever as the teacher, inherit its document encodings, and learn a student query encoder to mimic the teacher embeddings on oracle reformulated queries. Our experiments on TREC CAsT and OR-QuAC demonstrate ConvDR's effectiveness in both few-shot and fully-supervised settings. It outperforms previous systems that operate in the sparse word space, matches the retrieval accuracy of oracle query reformulations, and is also more efficient thanks to its simplicity. Our analyses reveal that the advantages of ConvDR come from its ability to capture informative context while ignoring the unrelated context in previous conversation rounds. This makes ConvDR more effective as conversations evolve while previous systems may get confused by the increased noise from previous turns. Our code is publicly available at https://github.com/thunlp/ConvDR.
翻译:Dense relicing (DR) 有可能通过匹配学习到的嵌入空间,解决在对话搜索中的查询理解挑战。然而,由于DR模型对监督信号的额外需求以及对话搜索的长尾性质,这种适应具有挑战性。在本文中,我们展示了一个对调的Nense Retreival系统(ConvDR),该系统学习了多点对话查询和仅使用嵌入点产品检索文件的内嵌嵌嵌嵌嵌入。此外,我们利用教师-学生框架授予ConvDR微镜头能力,我们在此框架内聘用一个特别密集的检索器作为教师,继承其文件编码,并学习一个学生查询编码器,以模拟教师嵌入或变换的教师。我们在TRCCAST和OR-QuAC的实验显示CRDR在少发和完全超固的环境下的有效性。它比以前在稀薄的单词空间运行的系统要差一些,与检索到的精确度查询重新校准,而且由于它的简单性,我们的分析在公开的简洁度上学习了它。 我们的分析揭示了学生在以往的 RDRDRDR 的变换回的功能中可能使CRDRDR的不相关能力成为了我们之前的变换回的变。