One of the challenging problems in sequence generation tasks is the optimized generation of sequences with specific desired goals. Current sequential generative models mainly generate sequences to closely mimic the training data, without direct optimization of desired goals or properties specific to the task. We introduce OptiGAN, a generative model that incorporates both Generative Adversarial Networks (GAN) and Reinforcement Learning (RL) to optimize desired goal scores using policy gradients. We apply our model to text and real-valued sequence generation, where our model is able to achieve higher desired scores out-performing GAN and RL baselines, while not sacrificing output sample diversity.
翻译:序列生成任务中具有挑战性的问题之一是优化生成具有具体预期目标的序列。当前的顺序组合模型主要生成了与培训数据相近的序列,而没有直接优化与任务相关的预期目标或属性。我们引入了OptiGAN(OptiGAN),这是一个包含创性对立网络(GAN)和强化学习(RL)的基因模型,以利用政策梯度优化预期目标分数。我们将我们的模型应用于文本和实际估值序列生成,这样我们的模型能够在不牺牲产出样本多样性的同时实现更高的预期优异GAN和RL基线,同时不牺牲产出样本多样性。