Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion networks generated from an ASR decoder without human transcriptions to generate a variety of error patterns for model training. We evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded task-oriented conversational dialogues with ASR errors. Experimental results show the effectiveness of our proposed approach, boosting the knowledge-seeking turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster classification is boosted from 0.7924 to 0.9333 in Recall@1. After knowledge document re-ranking, our approach shows significant improvement in all knowledge selection metrics, from 0.7358 to 0.7806 in Recall@1, from 0.8301 to 0.9333 in Recall@5, and from 0.7798 to 0.8460 in MRR@5 on the test set. In the recent DSTC10 evaluation, our approach demonstrates significant improvement in knowledge selection, boosting Recall@1 from 0.495 to 0.7144 compared to the official baseline. Our source code is released in GitHub https://github.com/yctam/dstc10_track2_task2.git.
翻译:口头语言理解(SLU)的性能可以通过自动语音识别(ASR)错误来降低。我们提出一种新的方法,通过随机地用 ASR 错误模拟器来腐蚀清洁培训文本,从而改进SLU的稳健性,随后以联合方式自行纠正错误,并尽量减少目标分类损失。在拟议的错误模拟器中,我们利用没有人文抄录的ASR解码器产生的混乱网络,为模式培训生成各种错误模式。我们评价了我们针对以ASR错误为对象的面向知识的基于基础任务的对话的DSTC10挑战。实验结果显示我们拟议方法的实效,大大促进寻求知识的转弯式检测(KTD)F1,从0.94333至0.90044。在回调中,知识分组分类从0.7924至0.9333。在知识文件重新排位后,我们的方法显示所有知识选择指标都有了显著的改进,从0.7358至0.906,在回调中从0.83至0.933333;从0.778至0.485 升级为我们的正式选择标准。