Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary tasks. The semi-supervised nature of MTL-AUG allows for the exploitation of the abundant unlabelled data to further boost the performance of SER. We comprehensively evaluate the proposed framework in the following settings: (1) within corpus, (2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB datasets show improved results compared to existing state-of-the-art methods.
翻译:尽管最近在语音情感识别(SER)方面取得了进展,但最先进的系统缺乏对不同条件的概括化,造成总体化不足的一个关键根本原因是情绪数据集稀缺,这是设计稳健机器学习模式的重大障碍。SER最近的工作重点是利用多任务学习(MTL)方法,通过学习共享的演示来改进总体化。然而,大多数这些研究都提出MTL解决方案,要求为辅助任务提供元标签,从而限制SER系统的培训。本文提议了一个MTL框架(MTL-AUG),从扩大的数据中学习一般化的表述。我们使用增强型分类和不受监督的重建作为辅助任务,这样可以对SER系统进行关于强化数据的培训,而无需为辅助任务设置任何元标签。MTL-AUG的半监督性质使得能够利用丰富的无标签数据来进一步提升SER的绩效。我们全面评估了以下环境中的拟议框架:(1) 实体内部,(2) 跨公司和跨语言,(3) 激烈的言论,(3) 不受监督的重建作为辅助任务,使SER-M-M-M-M-M-M-S-S-S-S-S-S-SMAR-S-S-SDMAR-S-S-S-S-S-S-S-SU-SU-SAR-SU-SU-SAR-SAR-S-S-SAR-S-S-S-S-S-S-SAR-SAR-SAR-S-S-S-SAR-SAR-SAR-SAR-SAR-SDADADADADADADADADADA/CS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SDADADADADADADA-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-C-DA-C-C-C-C-C-C-S-