Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple candidates and a ranker chooses the best one. However, existing methods usually train the generator and the ranker separately, which causes a lack of mutual feedback and a misalignment of their objectives. This results in suboptimal generation quality. To address this issue, we propose JGR, a novel joint training algorithm that integrates the generator and the ranker in a single framework. JGR optimizes the generator with a hybrid objective that combines data likelihood and ranker reward, and trains the ranker with a contrastive loss that compares the generator outputs. By alternately updating the generator and the ranker, JGR can effectively harmonize their learning and enhance their quality jointly. We evaluate JGR on various text generation tasks and demonstrate that it surpasses existing methods on four public datasets across three common generation scenarios. Our code, data, and models are available at https://github.com/microsoft/AdvNLG.
翻译:为解决这一问题,我们提议JGR,这是将生成者和排名者纳入一个单一框架的新型联合培训算法。JGR将生成者优化为混合目标,将数据可能性和排名者奖赏结合起来,并对排名者进行与生成者产出相比较的对比性损失培训。通过对生成者和排名者进行更新,JGR可以有效地协调其学习,并联合提高质量。我们评估了各种文本生成任务,并证明它超越了在三种共同生成情景中四个公共数据集的现有方法。我们的代码、数据和模型可在 https://github.com/microsoft/AdvNLG 上查阅。