In dialogue systems, utterances with similar semantics may have distinctive emotions under different contexts. Therefore, modeling long-range contextual emotional relationships with speaker dependency plays a crucial part in dialogue emotion recognition. Meanwhile, distinguishing the different emotion categories is non-trivial since they usually have semantically similar sentiments. To this end, we adopt supervised contrastive learning to make different emotions mutually exclusive to identify similar emotions better. Meanwhile, we utilize an auxiliary response generation task to enhance the model's ability of handling context information, thereby forcing the model to recognize emotions with similar semantics in diverse contexts. To achieve these objectives, we use the pre-trained encoder-decoder model BART as our backbone model since it is very suitable for both understanding and generation tasks. The experiments on four datasets demonstrate that our proposed model obtains significantly more favorable results than the state-of-the-art model in dialogue emotion recognition. The ablation study further demonstrates the effectiveness of supervised contrastive loss and generative loss.
翻译:在对话系统中,类似语义的表达方式可能会在不同的背景下具有独特的情感。 因此, 模拟与演讲人依赖的长距离背景情感关系在对话情感识别中起着关键作用。 同时, 区分不同的情感类别是非三维的, 因为它们通常具有相似的语义。 为此, 我们采用了监督的对比性学习方法, 使不同的情感相互排斥, 以更好地识别相似的情感。 同时, 我们利用辅助性反应生成任务来提高模型处理背景信息的能力, 从而迫使模型识别不同背景下类似语义的情感。 为了实现这些目标, 我们使用预先训练的 encoder- decoder BART 模型作为我们的骨干模型, 因为该模型非常适合理解和一代人的任务。 四个数据集的实验表明,我们提议的模型获得比对话情感识别中最先进的模型更有利的结果。 校正研究进一步展示了被监督的对比性损失和基因化损失的有效性 。