Affective speech analysis is an ongoing topic of research. A relatively new problem in this field is the analysis of vocal bursts, which are nonverbal vocalisations such as laughs or sighs. Current state-of-the-art approaches to address affective vocal burst analysis are mostly based on wav2vec2 or HuBERT features. In this paper, we investigate the use of the wav2vec successor data2vec in combination with a multitask learning pipeline to tackle different analysis problems at once. To assess the performance of our efficient multitask learning architecture, we participate in the 2022 ACII Affective Vocal Burst Challenge, showing that our approach substantially outperforms the baseline established there in three different subtasks.
翻译:情感语言分析是一个持续的研究课题。这个领域一个相对较新的问题就是语音连发分析,这些是非语言的发音,如笑或叹气。目前最先进的处理情感声音爆发分析的方法大多基于 wav2vec2 或 HuBERT 的功能。在本文中,我们调查了使用 wav2vec 后续数据2vec 与多任务学习管道同时解决不同分析问题。为了评估我们高效的多任务学习架构的绩效,我们参加了2022 年ACII Affective Vocal Burst 挑战, 这表明我们的方法大大超过在三个不同的子塔斯建立的基线。