Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that re-ranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the persona-consistency of generated responses.
翻译:一致性是对话代理人面临的主要挑战之一。像人一样的对话代理人不仅应该自然地作出反应,而且还应该保持一个一致的人。在本文中,我们利用自然语言推断技术的优势来解决人与人之间持续对话的问题。不同于通过非人之间对话模式重新排列检索答复的现有工作,我们将此任务作为一个强化学习问题,并提议将非人语言指数信号作为对话产生过程的奖赏。具体地说,我们的生成者使用基于注意的编码解码器来产生基于人的反应。我们的评价者由两个部分组成:对抗性训练的自然特性模块和基于非人一致性模块。此外,我们在评价人与人之间一致性时使用另一种完善的NLI模型。关于人和自动计量的实验结果,包括基于模型的一致性评估,表明拟议的方法优于强有力的基因基准,特别是在生成的应对措施的个一致性方面。