The success of emotional conversation systems depends on sufficient perception and appropriate expression of emotions. In a real-world conversation, we firstly instinctively perceive emotions from multi-source information, including the emotion flow of dialogue history, facial expressions, and personalities of speakers, and then express suitable emotions according to our personalities, but these multiple types of information are insufficiently exploited in emotional conversation fields. To address this issue, we propose a heterogeneous graph-based model for emotional conversation generation. Specifically, we design a Heterogeneous Graph-Based Encoder to represent the conversation content (i.e., the dialogue history, its emotion flow, facial expressions, and speakers' personalities) with a heterogeneous graph neural network, and then predict suitable emotions for feedback. After that, we employ an Emotion-Personality-Aware Decoder to generate a response not only relevant to the conversation context but also with appropriate emotions, by taking the encoded graph representations, the predicted emotions from the encoder and the personality of the current speaker as inputs. Experimental results show that our model can effectively perceive emotions from multi-source knowledge and generate a satisfactory response, which significantly outperforms previous state-of-the-art models.
翻译:情感对话系统的成功取决于对情绪的足够认识和适当的表达。 在现实世界的谈话中,我们首先本能地从多源信息中感知到情感,包括对话历史、面部表达和演讲者的情感流和个性,然后根据我们的个性表达适当的情感,但这些多种类型的信息在情感对话领域没有得到充分的利用。为了解决这个问题,我们为情感对话的生成提出了一个基于不同图形的模型。具体地说,我们设计了一个基于图表的超异性图解码器,用来代表对话内容(即对话历史、情感流、面部表达和演讲者个性),用一个混杂的图形神经网络,然后预测适当的情绪以反馈。 之后,我们用一个情感-人际关系软件解码器来产生一种反应,不仅与谈话环境相关,而且与适当的情绪相适应,方法是采用编码图形的表达方式、来自编码化器的预测情绪以及当前演讲者的个性作为投入。实验结果显示,我们的模型能够有效地从多源知识、情感流、面部和出令人满意的反应,大大超出以前的状态。