推理的形态：大型语言模型推理轨迹的拓扑分析 (The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models)

Evaluating the quality of reasoning traces from large language models remains understudied, labor-intensive, and unreliable: current practice relies on expert rubrics, manual annotation, and slow pairwise judgments. Automated efforts are dominated by graph-based proxies that quantify structural connectivity but do not clarify what constitutes high-quality reasoning; such abstractions can be overly simplistic for inherently complex processes. We introduce a topological data analysis (TDA)-based evaluation framework that captures the geometry of reasoning traces and enables label-efficient, automated assessment. In our empirical study, topological features yield substantially higher predictive power for assessing reasoning quality than standard graph metrics, suggesting that effective reasoning is better captured by higher-dimensional geometric structures rather than purely relational graphs. We further show that a compact, stable set of topological features reliably indicates trace quality, offering a practical signal for future reinforcement learning algorithms.

翻译：评估大型语言模型推理轨迹的质量仍是一个研究不足、劳动密集且不可靠的领域：当前实践依赖于专家评分标准、人工标注和缓慢的成对判断。自动化评估工作主要采用基于图的代理指标，这些指标量化结构连通性但未能阐明高质量推理的构成要素；此类抽象方法对于本质上复杂的过程可能过于简化。我们提出了一种基于拓扑数据分析的评估框架，该框架能捕捉推理轨迹的几何特征，并实现标签高效的自动化评估。在我们的实证研究中，拓扑特征在评估推理质量方面展现出比标准图度量指标显著更高的预测能力，这表明有效推理更适合通过高维几何结构而非纯粹关系图来刻画。我们进一步证明，一组紧凑且稳定的拓扑特征能可靠地指示轨迹质量，为未来强化学习算法提供了实用信号。