Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems. A drawback of existing PLM-based models is their non-Markov architectures across turns, i.e., the whole history is used as the conditioning input at each turn. First, this brings inefficiencies in memory and computation. Furthermore, using the whole history increases model complexity and may hurt the training efficiency, especially when facing small amounts of labeled training data (the low-resource setting). In this paper, motivated by the observation that dialog states could be viewed as Markov states, we propose to build Markovian Generative Architectures (MGA) over PLM backbones for efficient TOD systems. Experiments on MultiWOZ2.1 show that in the rich-resource setting, the proposed Markov models reduce memory and time costs without performance degradation; in the low-resource setting, the training efficiency of the Markov models is more significant.
翻译:最近,GPT2和T5等基于变革的预先培训语言模型(PLM)被利用来建立以任务为导向的基因化对话(TOD)系统。基于PLM的现有模型的缺点是其非马尔科夫结构的反转,即整个历史被作为每个转弯的调节输入。首先,这在记忆和计算方面造成了效率低下。此外,使用整个历史模型增加了复杂性,并可能损害培训效率,特别是当面临少量的标签培训数据(低资源设置)时。在本文中,由于认为对话国可被视为Markov州,我们提议为高效的TOD系统在PLM主干线上建立Markovian Genement 建筑(MGA) 。多WOZ2.1实验表明,在丰富的资源环境中,拟议的Markov模型可以降低记忆和时间成本,而不会降低性能退化;在低资源环境下,Markov模式的培训效率更为显著。