Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains three stages to combine the benefits of both Transformer and relational graph convolution network (R-GCN) for better contextual modeling. Firstly, a two-stream conversational Transformer is presented to extract the coarse self and inter-speaker contextual features for each utterance. Then, a speaker and position-aware conversation graph is constructed, and we propose an enhanced R-GCN model, called PAG, to refine the coarse features guided by a relative positional encoding. Finally, both of the features from the former two stages are input into a conditional random field layer to model the emotion transfer.
翻译:近些年来,在谈话(ERC)中,情感的识别在广泛的应用中引起了人们的极大关注。现有的 ERC 方法大多分别模拟自我和声音间环境,给它们之间缺乏足够的互动造成了一个重大问题。在本文中,我们提议为 ERC (S+PAGE) 建立一个新的演讲者和位置-软件图像神经网络模型(S+PAGE), 其中包括三个阶段, 将变形器和关联图形相容网络(R-GCN) 的效益结合起来, 以更好地进行背景建模。 首先, 提出双流对话变换器, 以提取每个发音中粗体的自我和声音间背景特征。 然后, 构建了一个演讲者和位置- 觉的谈话图, 我们提议一个强化的 R- GCN 模型, 称为 PAGGAG, 以完善以相对位置编码为指导的粗体特征。 最后, 前两个阶段的特征都是输入一个有条件的随机场层, 以模拟情感转移。