There has been a rapid development in data-driven task-oriented dialogue systems with the benefit of large-scale datasets. However, the progress of dialogue systems in low-resource languages lags far behind due to the lack of high-quality data. To advance the cross-lingual technology in building dialog systems, DSTC9 introduces the task of cross-lingual dialog state tracking, where we test the DST module in a low-resource language given the rich-resource training dataset. This paper studies the transferability of a cross-lingual generative dialogue state tracking system using a multilingual pre-trained seq2seq model. We experiment under different settings, including joint-training or pre-training on cross-lingual and cross-ontology datasets. We also find out the low cross-lingual transferability of our approaches and provides investigation and discussion.
翻译:利用大规模数据集,数据驱动的任务导向对话系统迅速发展,然而,由于缺乏高质量数据,低资源语言对话系统的进展远远落后于高质量数据。为了在建立对话系统的过程中推进跨语言技术,DSTC9提出了跨语言对话状态跟踪任务,根据丰富的资源培训数据集,我们用一种低资源语言测试DST模块。本文研究使用多语言预先培训的后继等模式,跨语言的基因化对话国家跟踪系统的可转让性。我们在不同环境下进行实验,包括就跨语言和跨主题数据集进行联合培训或预培训。我们还发现我们方法的跨语言传输能力低,并提供调查和讨论。