Spoken language understanding (SLU) is an essential task for machines to understand human speech for better interactions. However, errors from the automatic speech recognizer (ASR) usually hurt the understanding performance. In reality, ASR systems may not be easy to adjust for the target scenarios. Therefore, this paper focuses on learning utterance representations that are robust to ASR errors using a contrastive objective, and further strengthens the generalization ability by combining supervised contrastive learning and self-distillation in model fine-tuning. Experiments on three benchmark datasets demonstrate the effectiveness of our proposed approach.
翻译:通俗的语言理解(SLU)是机器理解人类语言以更好地互动的一项基本任务,然而,自动语音识别器(ASR)的错误通常会损害理解性能;在现实中,ASR系统可能不容易适应目标情景,因此,本文件侧重于学习对ASR错误具有强力的表达方式,使用对比性目标,通过将监督的对比性学习和在模型微调中的自我蒸馏结合起来,进一步加强了一般化能力。 对三个基准数据集的实验显示了我们拟议方法的有效性。