Due to the discrete nature of words, language GANs require to be optimized from rewards provided by discriminator networks, via reinforcement learning methods. This is a much harder setting than for continuous tasks, which enjoy gradient flows from discriminators to generators, usually leading to dramatic learning instabilities. However, we claim that this can be solved by making discriminator and generator networks cooperate to produce output sequences during training. These cooperative outputs, inherently built to obtain higher discrimination scores, not only provide denser rewards for training, but also form a more compact artificial set for discriminator training, hence improving its accuracy and stability. In this paper, we show that our SelfGAN framework, built on this cooperative principle, outperforms Teacher Forcing and obtains state-of-the-art results on two challenging tasks, Summarization and Question Generation.
翻译:语言GAN要求通过强化学习方法,从歧视者网络提供的奖励中优化语言GAN,这比连续的任务要难得多,这些任务从歧视者流向产生者,通常会导致剧烈的学习不稳定性。然而,我们声称,通过使歧视者和产生者网络合作,在培训期间产生产出序列,可以解决这个问题。这些合作产出的建立就是为了获得更高的歧视分数,不仅为培训提供了更密集的奖励,而且为歧视者培训形成了更紧凑的人工设置,从而提高了培训的准确性和稳定性。在本文中,我们展示了我们基于这一合作原则的“SelfGAN”框架,优于教师推动和获得两项具有挑战性的任务,即 " 交配和问题产生 " 的最新成果。