In recent years, deep-learning-based speech emotion recognition models have outperformed classical machine learning models. Previously, neural network designs, such as Multitask Learning, have accounted for variations in emotional expressions due to demographic and contextual factors. However, existing models face a few constraints: 1) they rely on a clear definition of domains (e.g. gender, noise condition, etc.) and the availability of domain labels; 2) they often attempt to learn domain-invariant features while emotion expressions can be domain-specific. In the present study, we propose the Nonparametric Hierarchical Neural Network (NHNN), a lightweight hierarchical neural network model based on Bayesian nonparametric clustering. In comparison to Multitask Learning approaches, the proposed model does not require domain/task labels. In our experiments, the NHNN models generally outperform the models with similar levels of complexity and state-of-the-art models in within-corpus and cross-corpus tests. Through clustering analysis, we show that the NHNN models are able to learn group-specific features and bridge the performance gap between groups.
翻译:近年来,基于深学习的语音情绪识别模型优于经典机器学习模型。以前,多任务学习等神经网络设计考虑到人口和环境因素导致情感表达方式的差异。但是,现有模型面临一些制约因素:1)它们依赖于对领域(例如性别、噪音状况等)和域名的可用性的明确定义;2)它们常常试图学习域别差异性特征,而情感表达方式可以针对具体领域。在本研究中,我们提议采用非对称高度神经网络(NHNNN),这是以巴耶斯非参数组合为基础的轻量级神经网络模型。与多任务学习方法相比,拟议的模型不需要域/任务标签。在我们的实验中,NHNNN模式一般比模型的复杂程度和最先进的模型在公司内部和跨子体测试中都高出类似水平。我们通过集群分析,表明NHNNM模型能够学习特定群体特征并缩小群体之间的性能差距。