Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical information can be obtained from pre-trained acoustic models, where the learned representations can improve valence estimation from speech. We investigate the use of pre-trained model representations to improve valence estimation from acoustic speech signal. We also explore fusion of representations to improve emotion estimation across all three emotion dimensions: activation, valence and dominance. Additionally, we investigate if representations from pre-trained models can be distilled into models trained with low-level features, resulting in models with a less number of parameters. We show that fusion of pre-trained model embeddings result in a 79% relative improvement in concordance correlation coefficient CCC on valence estimation compared to standard acoustic feature baseline (mel-filterbank energies), while distillation from pre-trained model embeddings to lower-dimensional representations yielded a relative 12% improvement. Such performance gains were observed over two evaluation sets, indicating that our proposed architecture generalizes across those evaluation sets. We report new state-of-the-art "text-free" acoustic-only dimensional emotion estimation $CCC$ values on two MSP-Podcast evaluation sets.
翻译:在过去几年里,人们广泛探索了来自声言信号的感官、价值和支配力等量度的量度情感,例如激活、价值和支配力。虽然对言论的感官和支配力的准确估计似乎是可能的,但是对价值的准确估计仍然具有挑战性。以前的研究显示,使用词汇信息可以提高价值估计性能。从经过预先训练的声学模型中可以获取法律信息,在这种模型中,学习到的表情可以改善言语的价值估测。我们调查了使用预先训练的模型代表来改进声调信号的估测值。我们还探索了各种表达方式的结合,以改进所有三个情感层面的情绪估计:激活、价值和支配力。此外,我们调查了预先训练的模型的表示是否可被注入到经过低级别特征训练的模型中,导致模型的量性能评估性能。我们显示,经过事先训练的模型嵌入后,与标准的声学特征基准(Mel-filterbank 能量)相比,在价值估算方面相互协调的CCCCCC系数比79%的相对改善。同时,从经过训练的模型嵌入了两个模型估值的估值评估,我们所观察到的Conal-rual-ruisal-deal-de-de-de-de-laxxxxxxxxxxxxxxxxx