Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems. However, online reinforcement learning of a GPT-2 based dialog system (DS), together with a end-to-end user simulator (US), has not ever been explored. Moreover, a drawback with existing GPT-2 based TOD systems is that they mostly employ the whole dialog history as input, which brings inefficiencies in memory and compute. In this paper, we first propose Simplified Generative Architectures (SGA) for DS and US respectively, both based on GPT-2 but using shortened history. Then, we successfully develop Jointly Reinforced US and DS, called SGA-JRUD. Our DS with the proposed SGA, when only supervised trained, achieves state-of-the-art performance on MultiWOZ2.1 and is more compute-efficient in both training and generation. Extensive experiments on MultiWOZ2.1 further show the superiority of SGA-JRUD in both offline and online evaluations.
翻译:最近,在监管的对预先培训的GPT-2进行风化调整以建立端到端任务导向对话系统方面取得了进展,然而,GPT-2型对话系统(DS)的在线强化学习以及端到端用户模拟器(美国)从未被探索过,此外,现有GPT-2型基于TOD的GPT-2系统的一个缺点是,这些系统大多使用整个对话历史作为输入,这在记忆和计算方面造成了效率低下。在本文件中,我们首先提议分别以GPT-2为基础,但使用缩短的历史为DS提供简化生成结构(SGA ) 。 然后,我们成功地开发了联合强化的美国和DS,称为SGA-JRUD。我们与拟议的SGA的DDS,在经过监督的训练后,在MUWOZ2.1上实现了最新水平的性能,在培训和生成两方面都比较有效。关于MUEWOZ2.1的大规模实验进一步显示SGA-JRUD在离线和在线评价中的优势。