Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect, and depends on our knowledge and beliefs about the content of the reasoning problem. For example, humans reason much more reliably about logical rules that are grounded in everyday situations than arbitrary rules about abstract attributes. The training experiences of language models similarly endow them with prior expectations that reflect human knowledge and beliefs. We therefore hypothesized that language models would show human-like content effects on abstract reasoning problems. We explored this hypothesis across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task (Wason, 1968). We find that state of the art large language models (with 7 or 70 billion parameters; Hoffman et al., 2022) reflect many of the same patterns observed in humans across these tasks -- like humans, models reason more effectively about believable situations than unrealistic or abstract ones. Our findings have implications for understanding both these cognitive effects, and the factors that contribute to language model performance.
翻译:抽象推理是智能系统的关键能力。 大型语言模型在抽象推理任务上达到超常性能, 但显示出许多缺陷。 但是, 人类抽象推理也是不完美的, 并且取决于我们对推理问题内容的知识和信仰。 例如, 人类更可靠地理解基于日常情况的逻辑规则, 而不是对抽象属性的任意规则。 语言模型的培训经验同样赋予它们反映人类知识和信仰的先前期望。 因此, 我们假设语言模型会显示对抽象推理问题的类似内容效果。 我们探索了三种逻辑推理任务中的这一假设: 自然语言推理, 判断银河主义的逻辑有效性, 以及 Wason 选择任务( Wason,1968年) 。 我们发现, 艺术大型语言模型的状况(有70亿或700亿个参数; Hoffman 等人, 2022年) 反映了人类在这些任务中观察到的许多相同的模式 -- 例如人类, 模型比不现实或抽象的模型更能有效地解释可实现的情况。 我们的发现对理解这些认知效应以及促成语言模型表现的因素都有影响。