Likelihood, although useful as a training loss, is a poor search objective for guiding open-ended generation from language models (LMs). Existing generation algorithms must avoid both unlikely strings, which are incoherent, and highly likely ones, which are short and repetitive. We propose contrastive decoding (CD), a more reliable search objective that returns the difference between likelihood under a large LM (called the expert, e.g. OPT-13b) and a small LM (called the amateur, e.g. OPT-125m). CD is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs, and that this difference signals exactly which texts should be preferred. CD requires zero training, and produces higher quality text than decoding from the larger LM alone. It also generalizes across model types (OPT and GPT2) and significantly outperforms four strong decoding algorithms in automatic and human evaluations.
翻译:可能性虽然作为培训损失是有用的,但对于指导语言模型(LMs)中不开放的一代来说,其搜索目标很差。现有的一代算法必须避免不易使用的字符串,即不一致的字符串,以及极有可能的字符串,即短短的和重复的。我们提出了对比式解码(CD),这是一个更可靠的搜索目标,可以返回大型LM(称为专家,如OTP-13b)和小型LM(称为业余人,如OTP-125m)下的可能性之间的差别。 CD的灵感来自以下事实,即大型LMs的失败在较小的LMs中更为普遍,而这种差异恰恰表明哪些文本应当被选用。CD需要零培训,并且产生比仅大LM解码质量更高的文本。它还概括了各种模型(OPT和GPT2),并且大大超越了自动和人力评估中的四种强有力的解码算法。