On-the-fly reasoning often requires adaptation to novel problems under limited data and distribution shift. This work introduces CausalARC: an experimental testbed for AI reasoning in low-data and out-of-distribution regimes, modeled after the Abstraction and Reasoning Corpus (ARC). Each CausalARC reasoning task is sampled from a fully specified causal world model, formally expressed as a structural causal model. Principled data augmentations provide observational, interventional, and counterfactual feedback about the world model in the form of few-shot, in-context learning demonstrations. As a proof-of-concept, we illustrate the use of CausalARC for four language model evaluation settings: (1) abstract reasoning with test-time training, (2) counterfactual reasoning with in-context learning, (3) program synthesis, and (4) causal discovery with logical reasoning. Within- and between-model performance varied heavily across tasks, indicating room for significant improvement in language model reasoning.
翻译:即时推理通常需要在有限数据和分布偏移下适应新问题。本研究提出CausalARC:一个模拟抽象与推理语料库(ARC)的低数据与外分布场景下人工智能推理实验平台。每个CausalARC推理任务均从完全指定的因果世界模型中采样,该模型以结构因果模型形式正式表达。通过原则性数据增强,以少样本上下文学习演示的形式提供关于世界模型的观测性、干预性和反事实反馈。作为概念验证,我们展示了CausalARC在四种语言模型评估场景中的应用:(1)测试时训练的抽象推理,(2)上下文学习的反事实推理,(3)程序合成,以及(4)逻辑推理的因果发现。模型内部及模型间的性能在不同任务间差异显著,表明语言模型推理能力仍有较大提升空间。