In a real-world dialogue system, generated responses must satisfy several interlocking constraints: being informative, truthful, and easy to control. The two predominant paradigms in language generation -- neural language modeling and rule-based generation -- both struggle to satisfy these constraints. Even the best neural models are prone to hallucination and omission of information, while existing formalisms for rule-based generation make it difficult to write grammars that are both flexible and fluent. We describe a hybrid architecture for dialogue response generation that combines the strengths of both approaches. This architecture has two components. First, a rule-based content selection model defined using a new formal framework called dataflow transduction, which uses declarative rules to transduce a dialogue agent's computations (represented as dataflow graphs) into context-free grammars representing the space of contextually acceptable responses. Second, a constrained decoding procedure that uses these grammars to constrain the output of a neural language model, which selects fluent utterances. The resulting system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.
翻译:在现实世界的对话系统中,生成的响应必须满足若干相互交织的制约因素:信息丰富、真实和易于控制的。语言生成的两个主要范例 -- -- 神经语言模型和基于规则的生成 -- -- 都是为了满足这些制约而奋斗。即使是最好的神经模型也容易产生幻觉和遗漏信息,而基于规则的生成的现有形式主义使得难以写出既灵活又流畅的语法。我们描述了将两种方法的长处结合起来的对话响应生成混合结构。这个架构有两个组成部分。首先,基于规则的内容选择模式的定义是使用称为数据流转换的新的正式框架,即使用宣示规则将对话代理的计算(作为数据流图)转换为不上下文的语法,代表了背景上可接受的反应空间。第二,使用这些语法的制约解码程序来限制神经语言模型的输出,该语言选择了流畅的语法。由此产生的系统在人类对流利、相关性和真实性的评价中超越了基于规则和学习的方法。