A number of learned sparse and dense retrieval approaches have recently been proposed and proven effective in tasks such as passage retrieval and document retrieval. In this paper we analyze with a replicability study if the lessons learned generalize to the retrieval of responses for dialogues, an important task for the increasingly popular field of conversational search. Unlike passage and document retrieval where documents are usually longer than queries, in response ranking for dialogues the queries (dialogue contexts) are often longer than the documents (responses). Additionally, dialogues have a particular structure, i.e. multiple utterances by different users. With these differences in mind, we here evaluate how generalizable the following major findings from previous works are: (F1) query expansion outperforms a no-expansion baseline; (F2) document expansion outperforms a no-expansion baseline; (F3) zero-shot dense retrieval underperforms sparse baselines; (F4) dense retrieval outperforms sparse baselines; (F5) hard negative sampling is better than random sampling for training dense models. Our experiments -- based on three different information-seeking dialogue datasets -- reveal that four out of five findings (F2-F5) generalize to our domain
翻译:最近提出了一系列学习到的稀少和密集的检索方法,这些方法在诸如通过检索和文件检索等任务中被证明是有效的。在本文件中,我们用一项可复制性研究进行分析,如果所吸取的教益概括到对对话的答复的检索,这是日益流行的对话搜索领域的一项重要任务。与文件通常比查询时间长的通过和文件检索不同,在对对话的答复排名中,查询(对话背景)往往比文件(答复)要长。此外,对话有一个特殊的结构,即不同用户的多发话。我们在这里评估了以前工作中以下主要结论的可概括性:(F1)查询扩展比不扩展基线的不扩展要快;(F2)文件扩展比不扩展基线的零发密度强;(F4)密集检索比对密集模型的培训随机抽样要好。我们根据三种不同的对话数据集进行的实验 -- -- 揭示出五个结论中的四个(F2-F5)一般领域(我们领域)的大小。