Conversations have become a critical data format on social media platforms. Understanding conversation from emotion, content and other aspects also attracts increasing attention from researchers due to its widespread application in human-computer interaction. In real-world environments, we often encounter the problem of incomplete modalities, which has become a core issue of conversation understanding. To address this problem, researchers propose various methods. However, existing approaches are mainly designed for individual utterances rather than conversational data, which cannot fully exploit temporal and speaker information in conversations. To this end, we propose a novel framework for incomplete multimodal learning in conversations, called "Graph Complete Network (GCNet)", filling the gap of existing works. Our GCNet contains two well-designed graph neural network-based modules, "Speaker GNN" and "Temporal GNN", to capture temporal and speaker dependencies. To make full use of complete and incomplete data, we jointly optimize classification and reconstruction tasks in an end-to-end manner. To verify the effectiveness of our method, we conduct experiments on three benchmark conversational datasets. Experimental results demonstrate that our GCNet is superior to existing state-of-the-art approaches in incomplete multimodal learning. Code is available at https://github.com/zeroQiaoba/GCNet.
翻译:理解来自情感、内容和其他方面的对话,也吸引研究人员越来越多的关注,因为其广泛应用于人与计算机的互动。在现实世界环境中,我们经常遇到不完全的模式问题,这已成为对话理解的核心问题。为了解决这一问题,研究人员提出了各种方法。但是,现有方法主要针对个人言论,而不是对话中无法充分利用时间和发言信息的对话数据。为此,我们提议了一个新的框架,用于在对话中进行不完全的多式联运学习,称为“Graph 完整网络(GCNet)”,以填补现有工程的空白。我们的GCNet包含两个设计良好的图形神经网络模块,即“Speaker GNNN”和“Temporal GNNN”,以捕捉时间和发言依赖性。为了充分利用完整和不完全的数据,我们共同以端对对话中的时间和发言信息进行优化分类和重建任务。为了验证我们的方法的有效性,我们在三个基准对话数据集上进行实验。实验结果显示,我们的GCNet网与现有的州-bast/Qal-commal 方法优于现有的州-abasimal ASumleinal codeal as amal ASmodistryal colmlemental colmacol