End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap', resulting in generated responses which are dull, repetitive, and often inconsistent with dialogue history. Comparing ranked lists of multiple generated responses against the 'gold response' (from training data) reveals a wide diversity in response quality, with many good responses placed lower in the ranked list. The main challenge, addressed in this work, is then how to reach beyond greedily generated system responses, that is, how to obtain and select such high-quality responses from the list of overgenerated responses at inference without availability of the gold response. To this end, we propose a simple yet effective reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system. The idea is to use any sequence-level (similarity) scoring function to divide the semantic space of responses into high-scoring versus low-scoring partitions. At training, the high-scoring partition comprises all generated responses whose similarity to the gold response is higher than the similarity of the greedy response to the gold response. At inference, the aim is to estimate the probability that each overgenerated response belongs to the high-scoring partition, given only previous dialogue history. We validate the robustness and versatility of our proposed method on the standard MultiWOZ dataset: our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results. Additional experiments on the BiTOD dataset and human evaluation further ascertain the generalisability and effectiveness of the proposed framework.
翻译:端到端( E2E) 任务导向对话( ToD) 系统很容易落入所谓的“ 类似陷阱” 系统, 导致生成的响应乏味、重复且往往与对话历史不相符。 对比“ 黄金回应” ( 培训数据) 的多个响应排名列表显示, 反应质量存在广泛差异, 许多良好的响应在排名列表中被置于较低位置。 这项工作涉及的主要挑战是如何超越贪婪生成的系统响应, 即如何在没有金响应的情况下从推断出来的过度反应列表中获得和选择这样的高质量回应。 为此, 我们提议了一个简单而有效的重新排序方法, 目的是从最初由系统过度生成的回复列表中选择高质量项目。 想法是使用任何序列级( 相似性) 评分功能, 将响应的语义空间分为高分化与提议的低分解。 在培训中, 高分解的分布包括所有与金反应相似的偏差反应, 没有黄金反应。 降值值值值值值值值比以往的货币正值分析方法要高, 。