Emotion Recognition in Conversations (ERC) is an important and active research problem. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some external stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this observation, we propose a multimodal ERC model and augment it with an emotion-shift component. The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications), to improve emotion recognition. We experiment with different variants of the model, and results show that the inclusion of emotion shift signal helps the model to outperform existing multimodal models for ERC and hence showing the state-of-the-art performance on MOSEI and IEMOCAP datasets.
翻译:在对话中,情感认识是一个重要而积极的研究问题。最近的工作表明,使用多种模式(如文字、音频和视频)来提高情感认识的好处。在对话中,参与者倾向于保持一种特定的情绪状态,除非某些外部刺激因素引起变化。在对话中情绪的不断起伏和流动。受这一观察的启发,我们提出了一种多式的情感反应模型,并增加了一个情感-感应器组件。提议的情感-感应器组件是模块化的,可以添加到现有的任何多式 ERC 模型中(经过一些修改 ), 以提高情感认识。我们实验了该模型的不同变体,结果显示,包含情感变化信号有助于模型超越现有的ERC多式模型,从而展示MOCI 和 IEMOCAP 数据集的最新表现。