In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent's capability in successfully completing a task.
翻译:在这项工作中,我们提出一种混合学习方法,用于通过在线用户互动来培训面向任务的对话系统。学习面向任务的对话的普及方法包括运用用户对受监督的培训前模式的反馈进行强化学习。这种学习方法的效率可能因离线培训与在线互动学习阶段之间对话状态分配不匹配而受到影响。为了应对这一挑战,我们提议一种混合模仿和强化学习方法,对话代理可以通过学习人类的教学和反馈,从与用户的互动中有效地学习。我们设计了一种基于神经网络的面向任务的对话代理,可以通过拟议的学习方法优化最终到终端。实验结果显示,我们的终端到终端对话代理可以通过用户教学的模仿学习来有效地从错误中学习。在模仿学习阶段后运用用户反馈来强化学习,进一步提高该代理成功完成任务的能力。