Task-oriented dialog presents a difficult challenge encompassing multiple problems including multi-turn language understanding and generation, knowledge retrieval and reasoning, and action prediction. Modern dialog systems typically begin by converting conversation history to a symbolic object referred to as belief state by using supervised learning. The belief state is then used to reason on an external knowledge source whose result along with the conversation history is used in action prediction and response generation tasks independently. Such a pipeline of individually optimized components not only makes the development process cumbersome but also makes it non-trivial to leverage session-level user reinforcement signals. In this paper, we develop Neural Assistant: a single neural network model that takes conversation history and an external knowledge source as input and jointly produces both text response and action to be taken by the system as output. The model learns to reason on the provided knowledge source with weak supervision signal coming from the text generation and the action prediction tasks, hence removing the need for belief state annotations. In the MultiWOZ dataset, we study the effect of distant supervision, and the size of knowledge base on model performance. We find that the Neural Assistant without belief states is able to incorporate external knowledge information achieving higher factual accuracy scores compared to Transformer. In settings comparable to reported baseline systems, Neural Assistant when provided with oracle belief state significantly improves language generation performance.
翻译:以任务为导向的对话是一个困难的挑战,涉及多种问题,包括多端语言理解和生成、知识检索和推理以及行动预测。现代对话系统通常从利用监督学习将对话历史转换成一个象征性的、称为信仰状态的物体开始。然后,信仰状态用于解释外部知识来源,其结果与对话历史一起被独立地用于行动预测和反应生成任务。这种由个人优化组件组成的管道不仅使发展进程变得繁琐,而且使利用届会一级的用户强化信号变得非边际。在本文中,我们开发神经助理:一个单一神经网络模型,将对话历史和外部知识源作为投入,并联合制作文本反应和行动,由系统作为产出采取。该模型学习关于所提供的知识来源的理由,其薄弱的监督信号来自文本生成和行动预测任务,从而消除了对信仰状态说明的需要。在多功能数据集中,我们研究了远程监督的影响,以及模型性能知识库的规模。我们发现,没有信仰的神经助理能够将外部知识纳入外部信息,在报告的基本水平和水平上,与软件生成系统相比,能够显著地改进。