One of the difficulties in training dialogue systems is the lack of training data. We explore the possibility of creating dialogue data through the interaction between a dialogue system and a user simulator. Our goal is to develop a modelling framework that can incorporate new dialogue scenarios through self-play between the two agents. In this framework, we first pre-train the two agents on a collection of source domain dialogues, which equips the agents to converse with each other via natural language. With further fine-tuning on a small amount of target domain data, the agents continue to interact with the aim of improving their behaviors using reinforcement learning with structured reward functions. In experiments on the MultiWOZ dataset, two practical transfer learning problems are investigated: 1) domain adaptation and 2) single-to-multiple domain transfer. We demonstrate that the proposed framework is highly effective in bootstrapping the performance of the two agents in transfer learning. We also show that our method leads to improvements in dialogue system performance on complete datasets.
翻译:培训对话系统的一个困难是缺乏培训数据。我们探索了通过对话系统和用户模拟器之间的互动建立对话数据的可能性。我们的目标是开发一个建模框架,通过两个代理商之间的自我游戏将新的对话情景纳入其中。在这个框架内,我们首先对两个代理商进行收集源域对话的培训,使代理商能够通过自然语言相互交流。随着对少量目标域数据的进一步微调,这些代理商继续互动,目的是利用结构化奖励功能加强学习,改善他们的行为。在多WOZ数据集的实验中,对两个实际的转让学习问题进行了调查:(1) 域适应和(2) 单到多个域传输。我们证明,拟议的框架非常有效,可以引导两个代理商在传输学习中的性能。我们还表明,我们的方法可以改进对话系统在完整数据集上的性能。