Spoken Language Understanding (SLU) is one essential step in building a dialogue system. Due to the expensive cost of obtaining the labeled data, SLU suffers from the data scarcity problem. Therefore, in this paper, we focus on data augmentation for slot filling task in SLU. To achieve that, we aim at generating more diverse data based on existing data. Specifically, we try to exploit the latent language knowledge from pretrained language models by finetuning them. We propose two strategies for finetuning process: value-based and context-based augmentation. Experimental results on two public SLU datasets have shown that compared with existing data augmentation methods, our proposed method can generate more diverse sentences and significantly improve the performance on SLU.
翻译:口语理解(SLU)是建立对话系统的一个必要步骤。由于获得标签数据的成本昂贵,SLU面临数据稀缺问题。因此,在本文中,我们把重点放在SLU填补空档任务的数据扩增上。为此,我们的目标是根据现有数据生成更加多样化的数据。具体地说,我们试图通过微调来利用预先培训的语言模式的潜在语言知识。我们提出了两个微调程序战略:基于价值和基于背景的增强。两个公共SLU数据集的实验结果显示,与现有的数据扩增方法相比,我们提议的方法可以产生更多样化的句子,并显著改善SLU的绩效。