Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our learning. Here, we ask: how much of human-like thinking can be captured by learning statistical patterns in language alone? We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs). Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization to new, out-of-distribution problems expressed in language. We find that humans are far more robust than LLMs on this benchmark. Next, we propose a hybrid Parse-and-Solve model, which augments distributional LLMs with a structured symbolic reasoning module. We find that this model shows more robust adaptation to out-of-distribution planning problems, demonstrating the promise of hybrid AI models for more human-like reasoning.
翻译:人类语言为我们的思想提供了一个强大的窗口 -- -- 我们讲述故事、解释和通过文字表达我们的信仰和目标。 大量证据还表明语言在构建学习结构中发挥着发展作用。 在这里,我们问:学习单语言的统计模式可以捕捉到多少像人类的思维? 我们首先为比较人类和分布式大语言模型(LLMS)贡献了一个新的挑战基准。我们的基准包含两个解决问题的领域(规划和解释生成),目的是要求概括以语言表达的新的、分配以外的问题。我们发现,在这个基准上,人类比LLMs更强大。接下来,我们提出一个混合的粗略和模块,用结构化的象征性推理模型来增加分布式LMS。我们发现,这个模型显示了对分配性大语言模型问题的更强有力的适应性,展示了混合的AI模型对于更像人类的推理的允诺。