Likelihood training and maximization-based decoding result in dull and repetitive generated texts even when using powerful language models (Holtzman et al., 2019). Adding a loss function for regularization was shown to improve text generation output by helping avoid unwanted properties, such as contradiction or repetition (Li at al., 2020). In this work, we propose fine-tuning a language model by using policy gradient reinforcement learning, directly optimizing for better generation. We apply this approach to minimizing repetition in generated text, and show that, when combined with unlikelihood training (Welleck et al., 2020), our method further reduces repetition without impacting the language model quality. We also evaluate other methods for improving generation at training and decoding time, and compare them using various metrics aimed at control for better text generation output.
翻译:在这项工作中,我们建议通过使用政策梯度强化学习来微调语言模式,直接优化以创造更好的版本。我们采用这种方法来尽量减少生成文本的重复,并表明,如果结合不统一语言培训(Welleck等人,2020年),我们的方法将进一步减少重复,同时不影响语言模型质量。我们还评估了在培训和解码时间改进生成的其他方法,并用各种旨在控制更好生成文本产出的指标进行比较。