Emotion is an inherently subjective psychophysiological human-state and to produce an agreed-upon representation (gold standard) for continuous emotion requires a time-consuming and costly training procedure of multiple human annotators. There is strong evidence in the literature that physiological signals are sufficient objective markers for states of emotion, particularly arousal. In this contribution, we utilise a dataset which includes continuous emotion and physiological signals - Heartbeats per Minute (BPM), Electrodermal Activity (EDA), and Respiration-rate - captured during a stress induced scenario (Trier Social Stress Test). We utilise a Long Short-Term Memory, Recurrent Neural Network to explore the benefit of fusing these physiological signals with arousal as the target, learning from various audio, video, and textual based features. We utilise the state-of-the-art MuSe-Toolbox to consider both annotation delay and inter-rater agreement weighting when fusing the target signals. An improvement in Concordance Correlation Coefficient (CCC) is seen across features sets when fusing EDA with arousal, compared to the arousal only gold standard results. Additionally, BERT-based textual features' results improved for arousal plus all physiological signals, obtaining up to .3344 CCC compared to .2118 CCC for arousal only. Multimodal fusion also improves overall CCC with audio plus video features obtaining up to .6157 CCC to recognize arousal plus EDA and BPM.
翻译:情感是一种内在的主观的心理生理人类状态,并且为了产生一种同意的连续情感代表(黄金标准),需要一个耗时和昂贵的多重人类说明员培训程序。文献中有大量证据表明,生理信号是情绪状态,特别是激动人心状态的客观标志。在这一贡献中,我们使用一个数据集,其中包括持续的情感和生理信号——每分钟心跳(BPM)、电极活动(EDA)和呼吸率——在压力引发的情景(Triger Social Estrucal 测试)中捕获。我们使用一个长期的短期记忆、经常的神经网络来探索将这些生理信号用作振奋人心目标的好处,从各种音频、视频和基于文字的特征中学习。我们使用最先进的 MuSe-Toolbox 来考虑连续的情感和生理信号——每分钟心跳(BPM)、电动运动和呼吸节奏(CCC)的改进(CCC),在将EDA与令人振奋悟的特征相比,再用连续的神经网络网络来探索这些生理特征的好处。此外,BER-48-al-almainalalalalalalalalalalalal 将所有的文本改进到Balalalalbal beal beal resmal beal beal beal beal beal betoal beal beal beal beal betoal besal be res。