A commonly observed problem of the state-of-the-art natural language technologies, such as Amazon Alexa and Apple Siri, is that their services do not extend to most developing countries' citizens due to language barriers. Such populations suffer due to the lack of available resources in their languages to build NLP products. This paper presents AllWOZ, a multilingual multi-domain task-oriented customer service dialog dataset covering eight languages: English, Mandarin, Korean, Vietnamese, Hindi, French, Portuguese, and Thai. Furthermore, we create a benchmark for our multilingual dataset by applying mT5 with meta-learning.
翻译:亚马逊亚历山大和苹果Siri等最先进的自然语言技术的一个常见问题是,由于语言障碍,其服务没有扩大到大多数发展中国家的公民,这些居民由于缺乏语言资源来制造NLP产品而受害,本文介绍了AllWOZ,这是一个多语言、多领域、面向任务的客户服务对话数据集,涵盖八种语言:英语、普通话、韩语、越南语、印地语、法语、葡萄牙语和泰语。此外,我们通过将 mT5与元学习一起应用MT5,为多语言数据集设定了一个基准。