The research on human emotion under multimedia stimulation based on physiological signals is an emerging field, and important progress has been achieved for emotion recognition based on multi-modal signals. However, it is challenging to make full use of the complementarity among spatial-spectral-temporal domain features for emotion recognition, as well as model the heterogeneity and correlation among multi-modal signals. In this paper, we propose a novel two-stream heterogeneous graph recurrent neural network, named HetEmotionNet, fusing multi-modal physiological signals for emotion recognition. Specifically, HetEmotionNet consists of the spatial-temporal stream and the spatial-spectral stream, which can fuse spatial-spectral-temporal domain features in a unified framework. Each stream is composed of the graph transformer network for modeling the heterogeneity, the graph convolutional network for modeling the correlation, and the gated recurrent unit for capturing the temporal domain or spectral domain dependency. Extensive experiments on two real-world datasets demonstrate that our proposed model achieves better performance than state-of-the-art baselines.
翻译:基于生理信号的多媒体刺激下的人类情感研究是一个新兴领域,在基于多模式信号的情感识别方面取得了重要进展。然而,充分利用空间光谱时空域特征之间的互补性促进情感识别,以及多模式信号之间的异质性和相关性模型是具有挑战性的。在本文中,我们提议建立一个名为HetEmotionNet的新型双流多元图经常性神经网络,为情感识别提供多模式生理信号。具体地说,HetEmotionNet由空间时空流和空间光谱流组成,可以在统一的框架内结合空间光谱时空域特征。每条流都由图变异网络组成,以模型的形式建模相关关系,图变动网络以及用于捕捉时域域或光谱域依赖的封闭式经常性单元组成。关于两个真实世界数据集的广泛实验表明,我们提议的模型的性能优于最新基线。