基于上下文的嵌入式话语表示用于对话情感识别 (Context-Dependent Embedding Utterance Representations for Emotion Recognition in Conversations)

Emotion Recognition in Conversations (ERC) has been gaining increasing importance as conversational agents become more and more common. Recognizing emotions is key for effective communication, being a crucial component in the development of effective and empathetic conversational agents. Knowledge and understanding of the conversational context are extremely valuable for identifying the emotions of the interlocutor. We thus approach Emotion Recognition in Conversations leveraging the conversational context, i.e., taking into attention previous conversational turns. The usual approach to model the conversational context has been to produce context-independent representations of each utterance and subsequently perform contextual modeling of these. Here we propose context-dependent embedding representations of each utterance by leveraging the contextual representational power of pre-trained transformer language models. In our approach, we feed the conversational context appended to the utterance to be classified as input to the RoBERTa encoder, to which we append a simple classification module, thus discarding the need to deal with context after obtaining the embeddings since these constitute already an efficient representation of such context. We also investigate how the number of introduced conversational turns influences our model performance. The effectiveness of our approach is validated on the widely used open-domain DailyDialog dataset and on the task-oriented EmoWOZ dataset, for which we attain state-of-the-art results, surpassing ERC models also resorting to RoBERTa but with more complex classification modules, indicating that our context-dependent embedding utterance representation approach with a simple classification model can be more effective than context-independent utterance representation approaches with more complex classification modules.

翻译：对话情感识别（ERC）因对话代理越来越普遍而变得越来越重要。识别情感对于有效的交流至关重要，是开发有效和富有同理心的会话代理的关键组成部分。对话上下文的知识和理解对于识别对话者的情感非常有价值。因此，我们利用对话上下文来进行对话情感识别，即考虑之前的对话轮次。通常对建模对话上下文的方法是生成每个话语无关的表示，然后对这些表示进行上下文建模。在这里，我们通过利用预训练转换器语言模型的上下文表征能力，提出了每个话语的上下文相关嵌入表示。在我们的方法中，我们将要分类的话语附加到对话上下文中，作为输入馈送到RoBERTa编码器中。我们附加了一个简单的分类模块，因此在获取了这些嵌入表示之后，不需要处理上下文，因为这些嵌入表示已经是上下文的高效表示。我们还研究了引入的对话轮次数量如何影响模型性能。我们在广泛使用的开放领域DailyDialog数据集和面向任务的EmoWOZ数据集上验证了我们方法的有效性，在此数据集上实现了最先进的结果，超过了也利用RoBERTa但具有更复杂分类模块的ERC模型。这表明，我们的上下文相关的嵌入式话语表示方法配合简单的分类模型可能比处理更复杂的分类模块的上下文无关话语表示方法更有效。