Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and heterogeneous information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
翻译:由于能够提供同情性服务,在人文-计算机互动互动(HCI)系统中,内心感化认识(ERC)系统在人文-计算机互动(HCI)系统中起着重要作用,多式 ERC能够减轻单式方法的缺点。多式 ERC可以减轻单式方法的缺点。最近,由于在建模方面表现优异,在多个领域广泛使用了神经网络(GNN)图。在多式 ERC中,GNN能够提取长距离背景信息和跨式互动信息。不幸的是,由于像MGCN这样的现有方法能够直接结合多种模式,GMGCN等现有方法可能会产生多余的信息,并可能丢失各种信息。在这项工作中,我们提出了一个定向的基于图表的跨式跨模式补充(GFCFCFC)模块,能够高效地模拟背景和互动信息。GCFCFCFS通过多位子提取器和双向双向跨式补充(PairperC(PaircerCC)战略,我们从构建的图表中提取了各种边缘的边缘,从而使得GNNNNNP能够提取关键背景和交互式信息。在运行MAT-TAFMLFMLMFML上传递新的数据结构时,我们设计一个称为GMLA-CFMLBFBFBFBFM结果。此外的新的数据库。我们设计了一个称为G-CFA-CFMBFA-CFA-CFB。此外,我们设计了一个新的数据库数据库数据库数据库数据库。我们设计了一个新的数据库。