Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games. However, the Wizard-of-Oz data are in fact simulated data and thus are fundamentally different from real-life conversations, which are more noisy and casual. Recently, the SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog transcripts between real users and customer-service staffs from China Mobile. Based on the MobileCS dataset, the SereTOD challenge has two tasks, not only evaluating the construction of the dialogue system itself, but also examining information extraction from dialog transcripts, which is crucial for building the knowledge base for TOD. This paper mainly presents a baseline study of the two tasks with the MobileCS dataset. We introduce how the two baselines are constructed, the problems encountered, and the results. We anticipate that the baselines can facilitate exciting future research to build human-robot dialogue systems for real-life tasks.
翻译:最近,通过模拟游戏收集了一类以任务为导向的对话(TOD)数据集。然而,“Oz向导”数据实际上是模拟数据,因此与更吵闹、更随意的实际对话有根本的不同。最近,“SereTOD”挑战被组织起来,释放了“移动CS”数据集,该数据集由真实用户和中国流动用户客户服务工作人员之间的真实世界对话记录组成。根据“移动CS”数据集,“SereTOD”挑战有两项任务,不仅评估对话系统本身的构建情况,而且审查从对话记录中提取的信息,这对为TOD建立知识库至关重要。本文主要介绍了与“移动CS”数据集进行的两项任务的基准研究。我们介绍这两个基线是如何构建的,遇到的问题,以及结果。我们预计,“移动CSereTOD”挑战可以促进今后令人振奋人心的研究,为实际生活任务建立“人-机器人对话系统”。