Emotion Recognition in Conversations (ERC) is an important and active research area. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this observation, we propose a multimodal ERC model and augment it with an emotion-shift component that improves performance. The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications). We experiment with different variants of the model, and results show that the inclusion of emotion shift signal helps the model to outperform existing models for ERC on MOSEI and IEMOCAP datasets.
翻译:谈话中情感认识是一个重要而积极的研究领域。最近的工作显示,使用多种模式(如文字、音频和视频)来完成对情感研究中心的任务有好处。在谈话中,参与者倾向于保持某种情绪状态,除非某些刺激因素引起变化。在谈话中情绪的不断起伏和流动。受这一观察的启发,我们提出了一个多式 ERC 模型,并增加一个能改善性能的情感调动组件。提议的情感调动组件是模块化的,可以添加到任何现有的多式 ERC 模型中(稍作修改)。我们实验了模型的不同变体,结果显示,包含情感转换信号有助于模型在MOSEI 和 IEMOCAP 数据集上超越 ERC 的现有模型。