Given an input sequence (or prefix), modern language models often assign high probabilities to output sequences that are repetitive, incoherent, or irrelevant to the prefix; as such, model-generated text also contains such artifacts. To address these issues, we present RankGen, an encoder model (1.2B parameters) that scores model generations given a prefix. RankGen can be flexibly incorporated as a scoring function in beam search and used to decode from any pretrained language model. We train RankGen using large-scale contrastive learning to map a prefix close to the ground-truth sequence that follows it and far away from two types of negatives: (1) random sequences from the same document as the prefix, and, which discourage topically-similar but irrelevant generations; (2) sequences generated from a large language model conditioned on the prefix, which discourage repetition and hallucination. Experiments across four different language models (345M-11B parameters) and two domains show that RankGen significantly outperforms decoding algorithms like nucleus, top-k, and typical sampling on both automatic metrics (85.0 vs 77.3 MAUVE) as well as human evaluations with English writers (74.5% human preference over nucleus sampling). Analysis reveals that RankGen outputs are more relevant to the prefix and improve continuity and coherence compared to baselines. We open source our model checkpoints, code, and human preferences with detailed explanations for future research.
翻译:根据输入序列(或前缀),现代语言模型往往对重复、不连贯或与前缀无关的产出序列具有很高的概率;因此,模型生成的文本也含有此类文物。为了解决这些问题,我们提供了一个有分数代代代的编码模型(1.2B参数)RankGen(1.2B参数),根据前缀,可以灵活地将RankGen作为一个评分功能纳入波音搜索中,并用于从任何预先培训的语言模型解码。我们利用大规模对比学习来培训RankGen,以绘制接近于地面图谱序列的前缀,随后又远离两种负序列:(1) 与前缀相同的文档随机序列,并抑制不同但相近的几代;(2) 以前缀为条件的大型语言模型生成的序列,从而抑制了重复和幻觉。我们用四种不同的语言模型(345M-11B参数)和两个域的实验显示,RankGenGen明显地将离离核心、顶级和紧紧紧随其后的地面链序列,而远远离两种负的负数:(1) 作为主标(7-AVI)的自动分析,比标,比标,我们更精确的模型,比标比标比标,我们更接近。