Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches. We provide state-of-the-art baselines under three evaluation settings (monolingual, bilingual, and cross-lingual). The analysis of our baselines in different settings highlights 1) the effectiveness of training a bilingual ToD system compared to two independent monolingual ToD systems, and 2) the potential of leveraging a bilingual knowledge base and cross-lingual transfer learning to improve the system performance under low resource condition.
翻译:以任务为导向的对话(ToD)基准为衡量进展和发展更好的对话媒介提供了一个重要途径,然而,终端到终端的示范现有数据集仅限于一种单一语言,妨碍了为多语种国家和区域开发稳健的端到终端的托德系统。这里我们介绍BiToD,这是第一个用于端到端以任务为导向的对话模型的双语多域数据集。BiToD包含7k多个多域对话(144k发言),拥有庞大和现实的双语知识库。它作为评价双语的托德系统和跨语言转让学习方法的有效基准。我们在三种评价环境(双语、双语和跨语言)下提供最先进的基线。我们对不同环境的基线分析强调:(1) 培训双语托德系统与两个独立的单语托德系统相比的有效性;(2) 利用双语知识库和跨语言转让学习的潜力,以便在资源匮乏的情况下改进系统的业绩。