Historically lower-level tasks such as automatic speech recognition (ASR) and speaker identification are the main focus in the speech field. Interest has been growing in higher-level spoken language understanding (SLU) tasks recently, like sentiment analysis (SA). However, improving performances on SLU tasks remains a big challenge. Basically, there are two main methods for SLU tasks: (1) Two-stage method, which uses a speech model to transfer speech to text, then uses a language model to get the results of downstream tasks; (2) One-stage method, which just fine-tunes a pre-trained speech model to fit in the downstream tasks. The first method loses emotional cues such as intonation, and causes recognition errors during ASR process, and the second one lacks necessary language knowledge. In this paper, we propose the Wave BERT (WaBERT), a novel end-to-end model combining the speech model and the language model for SLU tasks. WaBERT is based on the pre-trained speech and language model, hence training from scratch is not needed. We also set most parameters of WaBERT frozen during training. By introducing WaBERT, audio-specific information and language knowledge are integrated in the short-time and low-resource training process to improve results on the dev dataset of SLUE SA tasks by 1.15% of recall score and 0.82% of F1 score. Additionally, we modify the serial Continuous Integrate-and-Fire (CIF) mechanism to achieve the monotonic alignment between the speech and text modalities.
翻译:在历史上较低层次的任务,如自动语音识别(ASR)和语音识别,是语言领域的主要重点。最近,对高层次口语理解(SLU)任务的兴趣不断增长,比如情绪分析(SA)等。然而,改进SLU任务的表现仍是一个巨大的挑战。基本上,SLU任务有两种主要方法:(1) 两阶段方法,即使用语言模式将语言转换为文字,然后使用语言模式获得下游任务的结果;(2) 一阶段方法,只是微调一种经过培训的语音模型,适合下游任务。第一种方法,最近对高级口语理解(SLU)越来越感兴趣。第一种方法在ASR进程中失去了情感提示,导致识别错误,而第二个方法则缺乏必要的语言知识。在本文件中,我们建议采用Wave BERT(WABERT),这是将语言模式转换为语言模型和语言模式的新型端点模式。WABERT,因此不需要从零开始培训。在培训期间,WABERT(WERT)的大多数参数被冻结,例如入入入了入了SBERERT(S-BERT) Ral-BER) Ralalal roalalal rodu rodualalalalal rodude) rodu rodu rodu rodude rodu rodu rodualalalalalal rodu rodu rodu rodudalal rodu rodal ro) rodu rodu,我们Silal 和SL rodu