We present the first empirical study investigating the influence of disfluency detection on downstream tasks of intent detection and slot filling. We perform this study for Vietnamese -- a low-resource language that has no previous study as well as no public dataset available for disfluency detection. First, we extend the fluent Vietnamese intent detection and slot filling dataset PhoATIS by manually adding contextual disfluencies and annotating them. Then, we conduct experiments using strong baselines for disfluency detection and joint intent detection and slot filling, which are based on pre-trained language models. We find that: (i) disfluencies produce negative effects on the performances of the downstream intent detection and slot filling tasks, and (ii) in the disfluency context, the pre-trained multilingual language model XLM-R helps produce better intent detection and slot filling performances than the pre-trained monolingual language model PhoBERT, and this is opposite to what generally found in the fluency context.
翻译:我们提出了第一项实证研究,调查不便探测对下游任务的影响,即探测意图和填补空档。我们为越南人进行了这项研究 -- -- 这种低资源语言以前没有进行过研究,也没有为探测不便提供公开数据集。首先,我们扩展了流利越南意图探测和空档填充数据Set PhoATIS, 手动添加了背景不便和注释。然后,我们根据预先培训的语言模型,利用不便探测和联合意图探测及空档填充的强基线进行实验。我们发现:(一) 不稳定对下游意图探测和填补空档任务的业绩产生消极影响,以及(二) 在不便情况下,预先培训的多语言模型XLM-R有助于产生比事先培训的单语语言模型PhoBERT更好的意图探测和空档填充性,这与在流利环境中普遍发现的情况相反。