Robust state tracking for task-oriented dialogue systems currently remains restricted to a few popular languages. This paper shows that given a large-scale dialogue data set in one language, we can automatically produce an effective semantic parser for other languages using machine translation. We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values and eliminate costly human supervision used in previous benchmarks. We also propose a new contextual semantic parsing model, which encodes the formal slots and values, and only the last agent and user utterances. We show that the succinct representation reduces the compounding effect of translation errors, without harming the accuracy in practice. We evaluate our approach on several dialogue state tracking benchmarks. On RiSAWOZ, CrossWOZ, CrossWOZ-EN, and MultiWOZ-ZH datasets we improve the state of the art by 11%, 17%, 20%, and 0.3% in joint goal accuracy. We present a comprehensive error analysis for all three datasets showing erroneous annotations can lead to misguided judgments on the quality of the model. Finally, we present RiSAWOZ English and German datasets, created using our translation methodology. On these datasets, accuracy is within 11% of the original showing that high-accuracy multilingual dialogue datasets are possible without relying on expensive human annotations. We release our datasets and software open source.
翻译:本文显示,鉴于使用一种语言的大型对话数据集,我们可以自动为其它语言使用机器翻译生成有效的语义解析器。 我们提议通过对齐自动翻译对话数据集,以确保忠实地翻译空档值,并消除以往基准中使用的昂贵的人力监督。 我们还提议一个新的语义解析模型,该模型编码了正式的空格和值,只有最后的代理方和用户语句。 我们显示,简洁的表述可以减少翻译错误的复合效应,而不影响实践中的准确性。 我们评估了我们在若干对话状态跟踪基准上采用的方法。 在里萨沃兹、克罗斯沃兹、克罗斯沃兹-EN和多沃兹-ZH数据集上,我们用联合目标精确度来改善艺术状态11%、17%、20%和0.3%。 我们对所有三个显示错误描述的数据集进行了全面的错误分析,可以导致对模型质量的错误判断。 最后,我们介绍里萨沃兹英语和德国原始数据图解是高额的。我们使用高额数据转换方法,我们在这些数据转换中不显示高额数据。