In this paper, we propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN) to deal with cross-corpus speech emotion recognition (SER) problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora. Specifically, DIDAN first adopts a simple deep regression network consisting of a set of convolutional and fully connected layers to directly regress the source speech spectrums into the emotional labels such that the proposed DIDAN can own the emotion discriminative ability. Then, such ability is transferred to be also applicable to the target speech samples regardless of corpus variance by resorting to a well-designed regularization term called implicit distribution alignment (IDA). Unlike widely-used maximum mean discrepancy (MMD) and its variants, the proposed IDA absorbs the idea of sample reconstruction to implicitly align the distribution gap, which enables DIDAN to learn both emotion discriminative and corpus invariant features from speech spectrums. To evaluate the proposed DIDAN, extensive cross-corpus SER experiments on widely-used speech emotion corpora are carried out. Experimental results show that the proposed DIDAN can outperform lots of recent state-of-the-art methods in coping with the cross-corpus SER tasks.
翻译:在本文中,我们提出了一种新型的深层转移学习方法,叫做深隐隐含分配整合网络(DDDDAN),以处理跨体体内情绪情绪识别(SER)问题,在这个方法中,标签培训(源)和未贴标签测试(目标)语言信号来自不同的公司。具体地说,DAAN首先采用了由一系列进化和完全相连的层组成的简单深深深回归网络,由一系列进化和完全相连的层组成,直接将源语言频谱倒退到情感标签中,这样提议的DADANAN能够拥有情感歧视能力。然后,这种能力通过采用一个设计完善的正规化术语,称为隐含分配协调(IDA),在解决跨体差异时,被转让给目标语音样本。与广泛使用的调控术语(IMDA)及其变式不同,拟议的IDADA吸收了样本重建理念,以隐含调分配差距,使DADAN能够从情感歧视性和排层性地将源性言论频谱中的情绪特性退入。为了评估提议的DAAN,对广泛使用的言论调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调制最近的实验结果的实验结果显示DADANDAN可以取代SRAAN的制制制制制制制制制制制制制制制制制制制制制制制制制制制制的最近的最近的最近工作。