Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.
翻译:目前对同情响应的生成方法通常直接编码整个对话历史,并将输出输入解码器以生成友好反馈。这些方法侧重于模拟背景信息,但忽视了发言者的直接意图。我们争辩说,对话中的最后一句话是经验性地传达了发言者的意图。因此,我们提议了一个名为“InferEM”的新颖模型,用于同情响应生成。我们单独编码了最后一句话,并通过基于多头关注的意向融合模块将其与整个对话结合起来,以捕捉发言者的意图。此外,我们利用先前的言论预测了最后一句话,它模拟了人类心理学,以猜测对话者可以事先讲什么话。为了平衡发言预测和响应生成的最优化速度,我们为InferEM设计了一个多任务学习战略。实验结果表明,InferEM在改进同情表达方面是合理和有效的。