Pre-trained language models have been successful in natural language generation (NLG) tasks. While various decoding methods have been employed, they often produce suboptimal results. We first present an empirical analysis of three NLG tasks: summarization, machine translation, and constrained text generation. We found that selecting the best output from the results of multiple decoding methods can significantly improve performance. To further improve reranking for NLG tasks, we proposed a novel method, \textsc{PairReranker}, which uses a single encoder and a pairwise loss function to jointly encode a source input and a pair of candidates and compare them. Experiments on three NLG tasks demonstrated the effectiveness and flexibility of \textsc{PairReranker}, showing strong results, compared with previous baselines. In addition, our \textsc{PairReranker} can generalize to significantly improve GPT-3 (text-davinci-003) results (e.g., 24.55\% on CommonGen and 11.35\% on WMT18 zh-en), even though our rerankers are not trained with any GPT-3 candidates.
翻译:经过培训的语文模式在自然语言生成(NLG)任务中取得了成功。虽然采用了各种解码方法,但往往产生亚优结果。我们首先对三种NLG任务进行了经验分析:概括、机器翻译和有限的文本生成。我们发现,从多重解码方法的结果中选择最佳产出可以大大改善绩效。为了进一步改进对NLG任务的重新排序,我们提出了一个新颖方法,\ textsc{PairReranker},它使用单一编码器和双向丢失函数来联合编码源输入和一对候选人并比较它们。对NLG任务的三个实验显示了\ textsc{PairRanker} 的有效性和灵活性,与以前的基线相比,它显示了强有力的结果。此外,我们的 & textsc{PairReranker}可以概括地大大改进GPT-3(t-davinci-003)结果(例如,关于共同 Gen 和 WMT18-ZH) 的11.35 ⁇,尽管我们的候选人没有受过任何GRORK候选人的培训。