We compare human and large language model (LLM) generalization in the number game, a concept inference task. Using a Bayesian model as an analytical framework, we examined the inductive biases and inference strategies of humans and LLMs. The Bayesian model captured human behavior better than LLMs in that humans flexibly infer rule-based and similarity-based concepts, whereas LLMs rely more on mathematical rules. Humans also demonstrated a few-shot generalization, even from a single example, while LLMs required more samples to generalize. These contrasts highlight the fundamental differences in how humans and LLMs infer and generalize mathematical concepts.
翻译:我们在数字游戏这一概念推理任务中比较了人类与大语言模型(LLM)的泛化能力。通过使用贝叶斯模型作为分析框架,我们考察了人类和LLM的归纳偏好与推理策略。贝叶斯模型对人类行为的拟合优于对LLM的拟合,这表明人类能够灵活地推断基于规则和基于相似性的概念,而LLM则更依赖数学规则。人类还展现了少样本泛化能力,甚至能从单个示例中泛化,而LLM需要更多样本才能实现泛化。这些对比凸显了人类与LLM在推断和泛化数学概念方式上的根本差异。