Nowadays, the current neural network models of dialogue generation(chatbots) show great promise for generating answers for chatty agents. But they are short-sighted in that they predict utterances one at a time while disregarding their impact on future outcomes. Modelling a dialogue's future direction is critical for generating coherent, interesting dialogues, a need that has led traditional NLP dialogue models that rely on reinforcement learning. In this article, we explain how to combine these objectives by using deep reinforcement learning to predict future rewards in chatbot dialogue. The model simulates conversations between two virtual agents, with policy gradient methods used to reward sequences that exhibit three useful conversational characteristics: the flow of informality, coherence, and simplicity of response (related to forward-looking function). We assess our model based on its diversity, length, and complexity with regard to humans. In dialogue simulation, evaluations demonstrated that the proposed model generates more interactive responses and encourages a more sustained successful conversation. This work commemorates a preliminary step toward developing a neural conversational model based on the long-term success of dialogues.
翻译:目前,当前对话生成(chatbots)的神经网络模型显示了为聊天代理提供答案的巨大希望。 但是,这些模型是短视的,因为它们一次预测一个声音,而忽视其对未来结果的影响。模拟对话的未来方向对于形成一致、有趣的对话至关重要,而这种需求引导了依赖强化学习的传统NLP对话模式。在文章中,我们解释了如何通过利用深厚强化学习来预测聊天机器人对话的未来回报来将这些目标结合起来。模型模拟了两个虚拟代理商之间的对话,用政策梯度方法来奖励显示出三种有益对话特征的顺序:非正式性流动、一致性和反应的简单性(与前瞻性功能相关 ) 。 我们根据人类的多样性、长度和复杂性评估了我们的模式。在对话模拟中,评估表明拟议的模式产生了更多互动性回应,并鼓励更持久的对话。这项工作纪念了在建立以长期对话成功为基础的神经对话模式方面迈出的初步步骤。