超越自然语言的可靠推理 (Reliable Reasoning Beyond Natural Language)

Despite their linguistic competence, Large Language Models (LLMs) often struggle to reason reliably and flexibly. To identify these shortcomings, we introduce the Non-Linear Reasoning (NLR) dataset, a collection of 55 unique, hand-designed problems that target reasoning bottlenecks arising from the sequential prediction paradigm of LLMs and the inherently linear nature of natural language. NLR tasks require iterative updates, backtracking, and reasoning across multiple parallel chains of thought but only basic arithmetic to solve. To address these limitations, we propose a neurosymbolic reasoning approach that integrates Prolog, a symbolic reasoning engine, into the inference pipeline of LLMs. This division of labor shifts the LLM's task from iterative computations to inferring all information, explicit or implied through common sense, and encoding it as logical code. Our method yields large and robust performance gains across the GSM8k and BIG-bench Navigate benchmarks and achieves near-perfect accuracy on NLR problems, maintaining robustness even as variable interdependence - the number of other variables on which the value of a single variable depends - increases.

翻译：尽管大型语言模型（LLMs）具备语言能力，但其在可靠且灵活的推理方面仍存在困难。为揭示这些缺陷，我们引入了非线性推理（NLR）数据集，该数据集包含55个独特的人工设计问题，旨在针对LLMs序列预测范式及自然语言固有线性特性所导致的推理瓶颈。NLR任务仅需基础算术即可求解，但要求进行迭代更新、回溯以及跨多个并行思维链的推理。为应对这些局限，我们提出一种神经符号推理方法，将符号推理引擎Prolog集成至LLMs的推理流程中。这种分工机制将LLMs的任务从迭代计算转变为推断所有信息（包括通过常识显式或隐式表达的信息）并将其编码为逻辑代码。我们的方法在GSM8k和BIG-bench Navigate基准测试中实现了显著且稳健的性能提升，并在NLR问题上达到接近完美的准确率，即使当变量相互依赖程度（即单个变量取值所依赖的其他变量数量）增加时仍保持鲁棒性。