Building user simulators (USs) for reinforcement learning (RL) of task-oriented dialog systems (DSs) has gained more and more attention, which, however, still faces several fundamental challenges. First, it is unclear whether we can leverage pretrained language models to design, for example, GPT-2 based USs, to catch up and interact with the recently advanced GPT-2 based DSs. Second, an important ingredient in a US is that the user goal can be effectively incorporated and tracked; but how to flexibly integrate goal state tracking and develop an end-to-end trainable US for multi-domains has remained to be a challenge. In this work, we propose a generative user simulator (GUS) with GPT-2 based architecture and goal state tracking towards addressing the above two challenges. Extensive experiments are conducted on MultiWOZ2.1. Different DSs are trained via RL with GUS, the classic agenda-based user simulator (ABUS) and other ablation simulators respectively, and are compared for cross-model evaluation, corpus-based evaluation and human evaluation. The GUS achieves superior results in all three evaluation tasks.
翻译:建立用户模拟器(US)以加强面向任务的对话系统(DS)的学习,这一点越来越受到越来越多的关注,然而,仍然面临着一些根本性的挑战。首先,我们是否能够利用预先训练的语言模型来设计,例如,以美国为基地的GPT-2系统,以便赶上最近先进的基于GPT-2系统的DS并与之互动。第二,美国的一个重要成份是用户目标可以有效纳入并跟踪;但是,如何灵活整合目标状态跟踪和开发一个可用于多域的端到端训练的美国仍是一个挑战。在这项工作中,我们提议用基于GPT-2的架构和目标状态跟踪来配制一个基因化用户模拟器(GUS),以应对上述两项挑战。在MultiWOZ2.1上进行了广泛的实验。不同的DS通过RL与GUS、经典基于议程的用户模拟器(ABUS)和其他模拟器(ABUS)培训,并分别将所有三项评价任务进行比较。