Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about what large language models can and cannot do.
翻译:Transformer型语言模型受到了公众广泛关注,但它们生成的文本甚至会对自然语言处理研究人员带来惊喜。本文综述了250多篇最近的英语语言模型行为研究,这些模型在进行特定任务的微调之前。语言模型具有基本的语法、语义、语用、世界知识和推理能力,但这些能力对特定的输入和表面特征敏感。尽管模型的规模已经扩大到几千亿个参数,从而显著提高了生成文本的质量,但模型仍然容易出现不准确的回复、常识错误、记忆式文本和社会偏见等问题。其中许多弱点可以被归结为对所学文本模式的过度或不足的归纳。我们综合近期研究结果,以突出目前对大型语言模型可以和不可以做到的了解。