Despite excelling in high-level reasoning, current language models lack robustness in real-world scenarios and perform poorly on fundamental problem-solving tasks that are intuitive to humans. This paper argues that both challenges stem from a core discrepancy between human and machine cognitive development. While both systems rely on increasing representational power, the absence of core knowledge, foundational cognitive structures in humans, prevents language models from developing robust, generalizable abilities, where complex skills are grounded in simpler ones within their respective domains. It explores empirical evidence of core knowledge in humans, analyzes why language models fail to acquire it, and argues that this limitation is not an inherent architectural constraint. Finally, it outlines a workable proposal for systematically integrating core knowledge into future multi-modal language models through the large-scale generation of synthetic training data using a cognitive prototyping strategy.
翻译:尽管当前的语言模型在高级推理方面表现出色,但在现实场景中缺乏鲁棒性,并且在人类直觉上认为基础的问题解决任务上表现不佳。本文认为,这两项挑战都源于人类与机器认知发展的核心差异。虽然两者都依赖于不断增强的表征能力,但核心知识——人类的基础认知结构——的缺失,阻碍了语言模型发展出鲁棒且可泛化的能力,即复杂技能应植根于各自领域内的更简单技能。本文探讨了人类核心知识的经验证据,分析了语言模型未能习得它的原因,并论证这一局限并非固有的架构约束。最后,本文提出了一个可行的方案,即通过认知原型策略大规模生成合成训练数据,从而系统地将核心知识整合到未来的多模态语言模型中。