Analogical reasoning -- the capacity to identify and map structural relationships between different domains -- is fundamental to human cognition and learning. Recent studies have shown that large language models (LLMs) can sometimes match humans in analogical reasoning tasks, opening the possibility that analogical reasoning might emerge from domain-general processes. However, it is still debated whether these emergent capacities are largely superficial and limited to simple relations seen during training or whether they encompass the flexible representational and mapping capabilities which are the focus of leading cognitive models of analogy. In this study, we introduce novel analogical reasoning tasks that require participants to map between semantically contentful words and sequences of letters and other abstract characters. This task necessitates the ability to flexibly re-represent rich semantic information -- an ability which is known to be central to human analogy but which is thus far not well captured by existing cognitive theories and models. We assess the performance of both human participants and LLMs on tasks focusing on reasoning from semantic structure and semantic content, introducing variations that test the robustness of their analogical inferences. Advanced LLMs match human performance across several conditions, though humans and LLMs respond differently to certain task variations and semantic distractors. Our results thus provide new evidence that LLMs might offer a how-possibly explanation of human analogical reasoning in contexts that are not yet well modeled by existing theories, but that even today's best models are unlikely to yield how-actually explanations.
翻译:类比推理——即识别并映射不同领域之间结构关系的能力——是人类认知与学习的核心能力。近期研究表明,大型语言模型(LLMs)在某些类比推理任务中能够达到与人类相当的水平,这暗示类比推理可能源于领域通用的处理过程。然而,这些涌现的能力究竟是表面性的、仅限于训练中见过的简单关系,还是涵盖了灵活的表征与映射能力——后者正是主流类比认知模型的核心焦点——目前仍存在争议。本研究引入了一种新颖的类比推理任务,要求参与者将具有语义内容的词汇与字母序列及其他抽象字符进行映射。该任务需要灵活地重新表征丰富的语义信息,这种能力被认为是人类类比推理的关键,但迄今尚未被现有认知理论与模型充分捕捉。我们评估了人类参与者和LLMs在基于语义结构与语义内容的推理任务上的表现,并通过引入变体测试了其类比推理的稳健性。先进的LLMs在多种条件下达到了与人类相当的水平,但人类与LLMs对某些任务变体和语义干扰项的反应存在差异。因此,我们的研究结果提供了新的证据,表明LLMs可能为现有理论尚未充分建模的语境中的人类类比推理提供一种“可能如何”的解释,但即使当前最优秀的模型也不太可能给出“实际如何”的解释。