Multimodal machine learning is an emerging area of research, which has received a great deal of scholarly attention in recent years. Up to now, there are few studies on multimodal conversational emotion recognition. Since Graph Neural Networks (GNNs) possess the powerful capacity of relational modeling, they have an inherent advantage in the field of multimodal learning. Multimodal data can be modeled as a graph, where each data object is regarded as a node, and both intra- and inter-modal dependencies existing between data objects can be regarded as edges. GNNs leverage the graph constructed from multimodal data to perform intra- and inter-modal information interaction, which effectively facilitates the integration and complementation of multimodal data. In this work, we propose a novel Graph attention based Multimodal Fusion Technique (GraphMFT) for emotion recognition in conversation. GraphMFT utilizes multiple graph attention networks improved to capture intra-modal contextual information and inter-modal complementary information. In addition, the proposed GraphMFT attempts to address the challenges of existing graph-based multimodal ERC models such as MMGCN. Empirical results on two public multimodal datasets reveal that our model outperforms the State-Of-The-Art (SOTA) approachs with the accuracies of 67.90% and 61.30%.
翻译:多式机器学习是一个新兴的研究领域,近年来已经受到大量学术关注。到目前为止,很少有关于多式谈话情绪认识的研究。由于平面神经网络(GNNS)具有建立关系模型的强大能力,因此在多式学习领域具有内在的优势。多式数据可以建模为图表,其中每个数据对象都被视为节点,而且数据对象之间现有的内部和相互依赖性都可被视为边缘。GNNS利用从多式联运数据构建的图表进行内部和相互之间信息互动,这有效地便利了多式联运数据的整合和补充。在这项工作中,我们提出了一个新的基于多式融合技术(GraphMFT)的图形关注点,用于在谈话中识别情绪。GMMMMMM30等现有基于图形的多式ERC模型的挑战,拟议GMC90等GMMMFT试图解决目前基于图形的多式电子计算机模型的挑战。