Conversations have become a critical data format on social media platforms. Understanding conversation from emotion, content, and other aspects also attracts increasing attention from researchers due to its widespread application in human-computer interaction. In real-world environments, we often encounter the problem of incomplete modalities, which has become a core issue of conversation understanding. To address this problem, researchers propose various methods. However, existing approaches are mainly designed for individual utterances or medical images rather than conversational data, which cannot exploit temporal and speaker information in conversations. To this end, we propose a novel framework for incomplete multimodal learning in conversations, called "Graph Complete Network (GCNet)", filling the gap of existing works. Our GCNet contains two well-designed graph neural network-based modules, "Speaker GNN" and "Temporal GNN", to capture temporal and speaker information in conversations. To make full use of complete and incomplete data in feature learning, we jointly optimize classification and reconstruction in an end-to-end manner. To verify the effectiveness of our method, we conduct experiments on three benchmark conversational datasets. Experimental results demonstrate that our GCNet is superior to existing state-of-the-art approaches in incomplete multimodal learning.
翻译:理解来自情感、内容和其他方面的谈话,也吸引研究人员越来越多的关注,因为其广泛应用于人类-计算机互动。在现实世界环境中,我们经常遇到不完全的模式问题,这已成为对话理解的核心问题。为了解决这一问题,研究人员提出了各种方法。但是,现有方法主要针对个人言论或医学图像,而不是对话数据,无法在对话中利用时间和演讲者信息。为此,我们提议了一个新的框架,用于在对话中进行不完全的多式联运学习,称为“Graph 完整网络(GCNet)”,以填补现有工程的空白。我们的GCNet包含两个设计良好的图形神经网络模块,即“Speaker GNNN”和“Tempor GNNN”,以在对话中捕捉时间和演讲者信息。为了在特征学习中充分利用完整和不完整的数据,我们以端对端方式联合优化分类和重建。为了验证我们的方法的有效性,我们在三个基准对话数据集上进行实验。实验结果显示,我们的GCNet在现有的不完全的多式联运方法中优于现有的状态学习。