Current research in dialogue systems is focused on conversational assistants working on short conversations in either task-oriented or open domain settings. In this paper, we focus on improving task-based conversational assistants online, primarily those working on document-type conversations (e.g., emails) whose contents may or may not be completely related to the assistant's task. We propose "NARLE" a deep reinforcement learning (RL) framework for improving the natural language understanding (NLU) component of dialogue systems online without the need to collect human labels for customer data. The proposed solution associates user emotion with the assistant's action and uses that to improve NLU models using policy gradients. For two intent classification problems, we empirically show that using reinforcement learning to fine tune the pre-trained supervised learning models improves performance up to 43%. Furthermore, we demonstrate the robustness of the method to partial and noisy implicit feedback.
翻译:目前对话系统的研究侧重于在任务导向或开放域设置下进行简短对话的对话助理。在本文中,我们侧重于改进基于任务的在线对话助理,主要是那些在文件类型对话(例如电子邮件)上工作,其内容可能或可能与助理的任务完全相关。我们建议“NARLE”是一个深度强化学习框架,以提高在线对话系统自然语言理解(NLU)部分,而不必为客户数据收集人类标签。拟议解决方案将用户情感与助理的行动联系起来,并利用政策梯度改进NLU模式。对于两个意图分类问题,我们从经验上表明,利用强化学习来调整预先培训的受监督学习模式,可以提高43%的绩效。此外,我们展示了部分和吵闹的隐含反馈方法的稳健性。