Most prior work in dialogue modeling has been on written conversations mostly because of existing data sets. However, written dialogues are not sufficient to fully capture the nature of spoken conversations as well as the potential speech recognition errors in practical spoken dialogue systems. This work presents a new benchmark on spoken task-oriented conversations, which is intended to study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling. We report that the existing state-of-the-art models trained on written conversations are not performing well on our spoken data, as expected. Furthermore, we observe improvements in task performances when leveraging n-best speech recognition hypotheses such as by combining predictions based on individual hypotheses. Our data set enables speech-based benchmarking of task-oriented dialogue systems.
翻译:大部分先前的对话模拟工作都是书面对话,这主要是因为现有的数据集;然而,书面对话不足以充分掌握口头对话的性质以及实际口头对话系统中可能存在的语音识别错误;这项工作为口头面向任务的对话提供了一个新的基准,目的是研究多领域对话的国家跟踪和知识型对话模型;我们报告说,目前接受书面对话培训的最先进的模式,与预期的一样,在口头数据方面表现不佳;此外,我们注意到,在利用最佳语音识别假设,例如根据个别假设综合预测时,任务表现有所改进。我们的数据集使得基于语言的基于任务的对话系统基准得以实现。