全面检索 " 对话应对措施 " 全面检索方法 (Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues)

Ranking responses for a given dialogue context is a popular benchmark in which the setup is to re-rank the ground-truth response over a limited set of $n$ responses, where $n$ is typically 10. The predominance of this setup in conversation response ranking has lead to a great deal of attention to building neural re-rankers, while the first-stage retrieval step has been overlooked. Since the correct answer is always available in the candidate list of $n$ responses, this artificial evaluation setup assumes that there is a first-stage retrieval step which is always able to rank the correct response in its top-$n$ list. In this paper we focus on the more realistic task of full-rank retrieval of responses, where $n$ can be up to millions of responses. We investigate both dialogue context and response expansion techniques for sparse retrieval, as well as zero-shot and fine-tuned dense retrieval approaches. Our findings based on three different information-seeking dialogue datasets reveal that a learned response expansion technique is a solid baseline for sparse retrieval. We find the best performing method overall to be dense retrieval with intermediate training, i.e. a step after the language model pre-training where sentence representations are learned, followed by fine-tuning on the target conversational data. We also investigate the intriguing phenomena that harder negatives sampling techniques lead to worse results for the fine-tuned dense retrieval models. The code and datasets are available at https://github.com/Guzpenha/transformer_rankers/tree/full_rank_retrieval_dialogues.

翻译：对特定对话背景的排序响应是一个流行的基准, 设置时将地面真相响应重新排在一组有限的以美元计价的响应上, 通常情况下美元为10美元。这个设置在对话响应排名中占据主导地位, 导致大量关注建设神经重新排序器, 而第一阶段的检索步骤被忽略。由于候选人名单中总是有正确的答案, $$的回复, 这一人为评价设置假设有一个第一阶段的检索步骤, 总是能够将正确的响应排在最高- 美元列表中。在本文中, 我们侧重于更现实的全级检索回应的任务, 在那里, $ 可能高达数百万个响应排名。我们调查对话背景和响应扩展技术, 以分散检索, 以及零点和精确调整的密度检索方法。我们基于三个不同的信息搜索对话数据集的研究结果显示, 学习式的响应扩展技术是智能检索的可靠基线。我们发现, 执行的最佳方法总体上是更密集的检索, 即: 精细的检索/ 。在更精确的校正的 Breal- true 中, 一步后, 我们用更精确的校准的校正的校正的校正的校正的校正。校正前的校对的校正的校正的校正的校正的校正的校正的校正。校正的校正的校正前的校正的校正的校正的校正的校正的校正的校正的校正。校正的校正的校正的校正的校正的校正的校正。校正的校正的校正的校正的校正的校正的校正的校正的校正的校前的校前的校前的校正的校前的校对中, 的校前的校正的校正的校正的校正的校正的校前的校前的校前的校对中, 的校正的校的校的校正的校正的校正的校正的校正的校正的校对中, 的校正的校正。