The past ten years have witnessed the rapid development of text-based intent detection, whose benchmark performances have already been taken to a remarkable level by deep learning techniques. However, automatic speech recognition (ASR) errors are inevitable in real-world applications due to the environment noise, unique speech patterns and etc, leading to sharp performance drop in state-of-the-art text-based intent detection models. Essentially, this phenomenon is caused by the semantic drift brought by ASR errors and most existing works tend to focus on designing new model structures to reduce its impact, which is at the expense of versatility and flexibility. Different from previous one-piece model, in this paper, we propose a novel and agile framework called CR-ID for ASR error robust intent detection with two plug-and-play modules, namely semantic drift calibration module (SDCM) and phonemic refinement module (PRM), which are both model-agnostic and thus could be easily integrated to any existing intent detection models without modifying their structures. Experimental results on SNIPS dataset show that, our proposed CR-ID framework achieves competitive performance and outperform all the baseline methods on ASR outputs, which verifies that CR-ID can effectively alleviate the semantic drift caused by ASR errors.
翻译:过去十年来,基于文字的意向探测迅速发展,其基准性能已经通过深层学习技术达到了惊人的水平;然而,由于环境噪音、独特的语音模式等,自动语音识别(ASR)错误在现实世界应用中是不可避免的,由于环境噪音、独特的语音模式等,自动语音识别(ASR)错误在现实应用中是不可避免的,导致以文字为基础的最新意图探测模型的性能急剧下降,从根本上说,这种现象是由ASR错误带来的语义漂移造成的,而且大多数现有工作往往侧重于设计新的模型结构以减少其影响,而这种影响是以多才性和灵活性为代价的。 与先前的单件模型不同,我们在本文件中提出了一个新颖而灵活的框架,称为CR-ID用于ASR错误的强化意向检测,用两个插件和播放模块,即语义流体校准模块和语音校准模块(PRM),这两个模块都是模型的模型,因此很容易与任何现有的意图检测模型融合,而不会改变结构。 SNIPS数据集的实验结果表明,我们提议的CID框架与以前的单件模型不同,与本文中的单一模型不同,我们提议的CID框架可以实现有竞争力的性性性性性表现,并超越了SR输出的所有基本方法。