Emotion dynamics modeling is a significant task in emotion recognition in conversation. It aims to predict conversational emotions when building empathetic dialogue systems. Existing studies mainly develop models based on Recurrent Neural Networks (RNNs). They cannot benefit from the power of the recently-developed pre-training strategies for better token representation learning in conversations. More seriously, it is hard to distinguish the dependency of interlocutors and the emotional influence among interlocutors by simply assembling the features on top of RNNs. In this paper, we develop a series of BERT-based models to specifically capture the inter-interlocutor and intra-interlocutor dependencies of the conversational emotion dynamics. Concretely, we first substitute BERT for RNNs to enrich the token representations. Then, a Flat-structured BERT (F-BERT) is applied to link up utterances in a conversation directly, and a Hierarchically-structured BERT (H-BERT) is employed to distinguish the interlocutors when linking up utterances. More importantly, a Spatial-Temporal-structured BERT, namely ST-BERT, is proposed to further determine the emotional influence among interlocutors. Finally, we conduct extensive experiments on two popular emotion recognition in conversation benchmark datasets and demonstrate that our proposed models can attain around 5\% and 10\% improvement over the state-of-the-art baselines, respectively.
翻译:情感动态建模是对话中情感识别的重要任务。 它的目的是在建立同情性对话系统时预测谈话情绪。 现有研究主要开发基于经常性神经网络的模型。 它们无法受益于最近开发的培训前战略的力量, 以便在谈话中更好地进行象征性的代表学习。 更严重的是, 很难通过将 RNT 的特征集中在对话者身上来区分对话者之间的依赖性和情感影响。 在本文中, 我们开发了一系列基于 BERT 的模型, 以具体捕捉对话情绪动态的交互和内部依赖性。 具体地说, 我们首先用 BERT 来取代 RNT 来丰富象征性的演示。 然后, 一个结构松散的BERT (F-BERT) 被应用到直接的交谈中将语音连接起来, 而一个结构高度结构化的BERT (H-BERT) 被用来区分对话者在连结言论时的具体对话者。 更重要的是, 一个空间结构化的BERT, 即ST-BERT, 我们结构化的BERT, 将首先取代RT, 取代RETT, 来丰富象征性的演示演示演示。 然后, 最终在10个对话者之间进行广泛的情感实验, 测试中, 将进一步 展示我们提出的10个 的 的情感影响。