Despite advances in mathematical reasoning capabilities, Large Language Models (LLMs) still struggle with calculation verification when using established prompting techniques. We present MDToC (Metacognitive Dynamic Tree of Concepts), a three-phase approach that constructs a concept tree, develops accuracy-verified calculations for each concept, and employs majority voting to evaluate competing solutions. Evaluations across CHAMP, MATH, and Game-of-24 benchmarks demonstrate our MDToC's effectiveness, with GPT-4-Turbo achieving 58.1\% on CHAMP, 86.6\% on MATH, and 85\% on Game-of-24 - outperforming GoT by 5\%, 5.4\%, and 4\% on all these tasks, respectively, without hand-engineered hints. MDToC consistently surpasses existing prompting methods across all backbone models, yielding improvements of up to 7.6\% over ToT and 6.2\% over GoT, establishing metacognitive calculation verification as a promising direction for enhanced mathematical reasoning.
翻译:尽管数学推理能力已取得进展,但大语言模型在使用现有提示技术时仍难以进行计算验证。本文提出MDToC(元认知动态概念树),这是一种三阶段方法:构建概念树,为每个概念开发经准确性验证的计算,并采用多数投票机制评估竞争性解决方案。在CHAMP、MATH和Game-of-24基准测试中的评估证明了MDToC的有效性,GPT-4-Turbo在CHAMP上达到58.1%,在MATH上达到86.6%,在Game-of-24上达到85%——在所有任务上分别优于GoT 5%、5.4%和4%,且无需人工设计的提示。MDToC在所有骨干模型上均持续超越现有提示方法,较ToT提升最高达7.6%,较GoT提升6.2%,从而确立了元认知计算验证作为增强数学推理能力的一个有前景的研究方向。