In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. First, we investigate how useful a pre-trained language model would be in a 2-step pipeline approach employing Automatic Speech Recognition (ASR) and transcripts-based sentiment analysis separately. Second, we propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach to take advantage of a large, but unlabeled speech dataset for training. Although spoken and written texts have different linguistic characteristics, they can complement each other in understanding sentiment. Therefore, the proposed system can not only model acoustic characteristics to bear sentiment-specific information in speech signals, but learn latent information to carry sentiments in the text representation. In these experiments, we demonstrate the proposed approaches improve F1 scores consistently compared to systems without a language model. Moreover, we also show that the proposed framework can reduce 65% of human supervision by leveraging a large amount of data without human sentiment annotation and boost performance in a low-resource condition where the human sentiment annotation is not available enough.
翻译:在本文中,我们探索使用预先培训的语言模型来学习情绪信息,用于分析语言情绪分析的书面文本。首先,我们调查预先培训的语言模型在采用自动语音识别(ASR)和笔录情绪分析分别采用的两步管道方法中是否有多大用处。第二,我们建议采用基于标签的半监督培训战略,使用端到端语言情绪语言模型的假语言模型,以利用大型、但未贴标签的语音数据集进行培训。虽然口头和书面文本具有不同的语言特征,但可以在理解情绪方面互为补充。因此,拟议的系统不仅可以模拟语音特征,在语音信号中应用特定情绪信息,而且可以学习潜在信息,在文本表达中传递情感。在这些实验中,我们展示了拟议方法,与没有语言模型的系统相比,不断提高F1的得分。此外,我们还表明,拟议框架可以通过利用大量数据,在没有足够人情绪注释的情况下,提高低资源条件下的性能,从而减少65%的人类监督。