We explore incorporating natural language inference (NLI) into the text generative pipeline by using a pre-trained NLI model to assess whether a generated sentence entails, contradicts, or is neutral to the prompt and preceding text. First, we show that the NLI task is predictive of generation errors made by GPT-3. We use these results to develop an NLI-informed generation procedure for GPT-J. Then, we evaluate these generations by obtaining human annotations on error types and overall quality. We find that an NLI strategy of maximizing entailment improves text generation when the nucleus sampling randomness parameter value is high, while one which maximizes contradiction is in fact productive when the parameter value is low. Overall, though, we demonstrate that an NLI strategy of maximizing the neutral class provides the highest quality of generated text (significantly better than the vanilla generations), regardless of parameter value.
翻译:我们探索将自然语言推论(NLI)纳入案文基因管道,方法是使用经过预先训练的NLI模型来评估生成的句子是否包含、与快速和先前的文本相矛盾或是否中和。首先,我们表明,NLI的任务是预测GPT-3的生成错误。我们利用这些结果为GPT-J开发出一种NLI知情的生成程序。然后,我们通过获得关于错误类型和总体质量的人类说明来评估这几代人。我们发现,在核心取样随机参数值高时,最大限度地增加所涉内容的文本生成将得到改善,而在参数值低时,最大矛盾的生成实际上是有效的。但总的来说,我们证明,最大程度增加中性等级的NLI战略提供了生成文本的最高质量(大大优于香草几代),而不管参数值高低。