Recently, there have been tremendous research outcomes in the fields of speech recognition and natural language processing. This is due to the well-developed multi-layers deep learning paradigms such as wav2vec2.0, Wav2vecU, WavBERT, and HuBERT that provide better representation learning and high information capturing. Such paradigms run on hundreds of unlabeled data, then fine-tuned on a small dataset for specific tasks. This paper introduces a deep learning constructed emotional recognition model for Arabic speech dialogues. The developed model employs the state of the art audio representations include wav2vec2.0 and HuBERT. The experiment and performance results of our model overcome the previous known outcomes.
翻译:最近,在语音识别和自然语言处理领域取得了巨大的研究成果,这归功于发展完善的多层深层学习模式,如wav2vec2.0、Wav2vecU、WavBERT和HuBERT,这些模式提供了更好的代表性学习和高信息捕捉。这些模式以数百个未贴标签的数据运行,然后为具体任务对一个小数据集进行微调。本文为阿拉伯语语音对话引入了一种深层学习构建的情感识别模式。所开发的模式采用了最新的声音表达方式,包括 wav2vec2.0和HuBERT。我们模型的实验和绩效结果克服了先前已知的结果。