Emotion Recognition in Conversations (ERC) is essential for building empathetic human-machine systems. Existing studies on ERC primarily focus on summarizing the context information in a conversation, however, ignoring the differentiated emotional behaviors within and across different modalities. Designing appropriate strategies that fit the differentiated multi-modal emotional behaviors can produce more accurate emotional predictions. Thus, we propose the DialogueTransformer to explore the differentiated emotional behaviors from the intra- and inter-modal perspectives. For intra-modal, we construct a novel Hierarchical Transformer that can easily switch between sequential and feed-forward structures according to the differentiated context preference within each modality. For inter-modal, we constitute a novel Multi-Grained Interactive Fusion that applies both neuron- and vector-grained feature interactions to learn the differentiated contributions across all modalities. Experimental results show that DialogueTRM outperforms the state-of-the-art by a significant margin on three benchmark datasets.
翻译:在对话中认识情感对于建立同情人的人体机器系统至关重要。关于ERC的现有研究主要侧重于在对话中总结背景信息,然而,忽视不同模式内和不同模式之间的不同情感行为。设计适合不同多模式情感行为的适当战略可以产生更准确的情感预测。因此,我们建议对话转换工具从内部和不同模式间的角度探索不同的情感行为。对于内部模式,我们建造了一个新的等级变换器,可以根据每种模式的不同背景偏好,在顺序结构与进料向结构之间轻松转换。对于不同模式,我们形成了一个新的多层次互动组合,应用神经和矢量的特性互动,以了解所有模式的不同贡献。实验结果显示,对话TRM在三个基准数据集上大大超越了最新技术。