Intent classification is a fundamental task in the spoken language understanding field that has recently gained the attention of the scientific community, mainly because of the feasibility of approaching it with end-to-end neural models. In this way, avoiding using intermediate steps, i.e. automatic speech recognition, is possible, thus the propagation of errors due to background noise, spontaneous speech, speaking styles of users, etc. Towards the development of solutions applicable in real scenarios, it is interesting to investigate how environmental noise and related noise reduction techniques to address the intent classification task with end-to-end neural models. In this paper, we experiment with a noisy version of the fluent speech command data set, combining the intent classifier with a time-domain speech enhancement solution based on Wave-U-Net and considering different training strategies. Experimental results reveal that, for this task, the use of speech enhancement greatly improves the classification accuracy in noisy conditions, in particular when the classification model is trained on enhanced signals.
翻译:在口语理解领域,本意分类是一项基本任务,最近已引起科学界的注意,这主要是因为以端到端神经模型接近它的可行性。这样,就有可能避免使用中间步骤,即自动语音识别,从而传播因背景噪音、自发语音、用户说话风格等造成的错误。 在制定适用于真实情况的解决方案时,研究环境噪音和相关的减少噪音技术如何用端到端神经模型来处理意图分类任务。在本文件中,我们试验了流利语音指令数据集的响亮版本,将意图分类器与基于Wave-U-Net的时空语音增强解决方案相结合,并考虑不同的培训战略。实验结果表明,为开展这项工作,使用增强语音的方法可大大提高噪音条件下的分类准确性,特别是在对分类模型进行增强信号的培训时。</s>