Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task and usually suffers from data scarcity. We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task by only taking advantage of abundant manual transcriptions of speech data. One obvious advantage of such method is that it provides a flexible framework to implement a low-resource SLU training task without requiring access to any further semantic annotations. In particular, a NLU model is taken as label generation network to predict intent and slot tags from texts; a multi-task network trains ASR task and SLU task synchronously from speech; and the predictions of label generation network are delivered to the multi-task network as semantic targets. The efficiency of the proposed algorithm is demonstrated with experiments on the public CATSLU dataset, which produces more suitable ASR hypotheses for the downstream NLU task.
翻译:口头语言理解(SLU)将自动语音识别(ASR)和自然语言理解(NLU)视为一项统一的任务,通常缺乏数据;我们利用基于元辅助学习的ASR和NLU联合培训方法,利用丰富的语音数据手工抄录,提高低资源语言理解(SLU)任务的业绩;这种方法的一个明显优点是,它为执行低资源语言识别(ASR)和自然语言理解(NLU)培训任务提供了一个灵活的框架,而不需要获得任何进一步的语义说明;特别是,将NLU模型作为标签生成网络,用于预测文本的意向和位置标记;多任务网络培训ASR任务和SLU任务,与演讲同步;多任务生成网络的预测作为语义目标被交付给多任务网络;拟议的算法的效率在公共的CATSLU数据集上进行实验,为下游区域任务产生更合适的ASR假说。