Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.
翻译:当前的共情回应生成方法通常直接对整个对话历史进行编码,然后将输出放入解码器中生成友好的反馈。这些方法专注于建模上下文信息,但忽略了捕捉发言人的直接意图。我们认为对话中最后一次话语在经验上传达了发言人的意图。因此,我们提出了一种名为InferEM的新型模型,用于生成共情回应。我们通过多头注意力意向融合模块单独编码最后一个话语,并将其与整个对话融合,以捕捉发言人的意图。此外,我们利用以前的话语来预测最后的话语,这模拟了人类心理来提前猜测对话者可能要说什么。为平衡话语预测和回应生成的优化速度,我们为InferEM设计了一种多任务学习策略。实验结果表明,InferEM在提高共情表达方面是合理有效的。