Generative models for dialog systems have gained much interest because of the recent success of RNN and Transformer based models in tasks like question answering and summarization. Although the task of dialog response generation is generally seen as a sequence-to-sequence (Seq2Seq) problem, researchers in the past have found it challenging to train dialog systems using the standard Seq2Seq models. Therefore, to help the model learn meaningful utterance and conversation level features, Sordoni et al. (2015b); Serban et al. (2016) proposed Hierarchical RNN architecture, which was later adopted by several other RNN based dialog systems. With the transformer-based models dominating the seq2seq problems lately, the natural question to ask is the applicability of the notion of hierarchy in transformer based dialog systems. In this paper, we propose a generalized framework for Hierarchical Transformer Encoders and show how a standard transformer can be morphed into any hierarchical encoder, including HRED and HIBERT like models, by using specially designed attention masks and positional encodings. We demonstrate that Hierarchical Encoding helps achieve better natural language understanding of the contexts in transformer-based models for task-oriented dialog systems through a wide range of experiments.
翻译:由于最近基于 RNN 和变异器的RNN 和 变异模型在问答和概括等任务中取得了成功,因此对话系统生成模型引起了很大的兴趣。虽然对话响应生成的任务通常被视为一个序列到序列问题(Seq2Seqeq),但过去研究人员发现,使用标准Seq2Seqeq 模式来培训对话系统具有挑战性。因此,为了帮助模型学习有意义的发音和对话级别特征,Sordoni等人(2015年b);Serban等人(2015年b);Serban等人(2016年)建议建立基于等级的 RNNN 结构,后来被其他基于 RNN 的多个对话系统采用。由于基于变异器的模型最近主导了后继2等问题,因此要问的自然问题是变异器对话系统中等级概念的可适用性。在本文件中,我们建议了一个关于高端变异器编码的通用框架,并表明标准变异器如何通过特别设计的遮罩和定位编码等模型,变成任何等级的编码。我们证明,高端造型变换式的变式模型有助于在变换式系统中实现更广义的变形系统。