Multimodal machine learning is an emerging area of research, which has received a great deal of scholarly attention in recent years. Up to now, there are few studies on multimodal conversational emotion recognition. Since Graph Neural Networks (GNNs) possess the powerful capacity of relational modeling, they have an inherent advantage in the field of multimodal learning. GNNs leverage the graph constructed from multimodal data to perform intra- and inter-modal information interaction, which effectively facilitates the integration and complementation of multimodal data. In this work, we propose a novel Graph network based Multimodal Fusion Technique (GraphMFT) for emotion recognition in conversation. Multimodal data can be modeled as a graph, where each data object is regarded as a node, and both intra- and inter-modal dependencies existing between data objects can be regarded as edges. GraphMFT utilizes multiple improved graph attention networks to capture intra-modal contextual information and inter-modal complementary information. In addition, the proposed GraphMFT attempts to address the challenges of existing graph-based multimodal ERC models such as MMGCN. Empirical results on two public multimodal datasets reveal that our model outperforms the State-Of-The-Art (SOTA) approachs with the accuracy of 67.90% and 61.30%.
翻译:近年来,多式机器学习是一个新兴的研究领域,近年来得到了大量的学术关注。到目前为止,关于多式谈话情绪识别的研究很少。由于图形神经网络具有建立关系模型的强大能力,因此在多式学习领域具有内在的优势。GNNS利用从多式数据构建的图表进行内部和跨式信息互动,这有效地便利了多式联运数据的整合和补充。在这项工作中,我们提议建立一个基于多式融合技术的新颖图表网络(GraphMFT),供谈话中的情绪识别。多式数据可以建为图表,其中每个数据对象都被视为节点,数据对象之间存在的内部和相互模式依赖性都可被视为边缘。GMCN等现有基于图形的多式组合技术(GraphMMFT)的挑战。AgMMC90-ANDA结果显示两个公共数据配置模型(MGCN)。