Large Language Models (LLMs) often rely on long chain-of-thought (CoT) reasoning to solve complex tasks. While effective, these trajectories are frequently inefficient, leading to high latency from excessive token generation, or unstable reasoning that alternates between underthinking (shallow, inconsistent steps) and overthinking (repetitive, verbose reasoning). In this work, we study the structure of reasoning trajectories and uncover specialized attention heads that correlate with distinct cognitive behaviors such as verification and backtracking. By lightly intervening on these heads at inference time, we can steer the model away from inefficient modes. Building on this insight, we propose CREST, a training-free method for Cognitive REasoning Steering at Test-time. CREST has two components: (1) an offline calibration step that identifies cognitive heads and derives head-specific steering vectors, and (2) an inference-time procedure that rotates hidden representations to suppress components along those vectors. CREST adaptively suppresses unproductive reasoning behaviors, yielding both higher accuracy and lower computational cost. Across diverse reasoning benchmarks and models, CREST improves accuracy by up to 17.5% while reducing token usage by 37.6%, offering a simple and effective pathway to faster, more reliable LLM reasoning.
翻译:大型语言模型(LLM)通常依赖长链思维(CoT)推理来解决复杂任务。虽然有效,但这些推理轨迹往往效率低下,导致因生成过多令牌而产生高延迟,或出现不稳定的推理,即在欠思考(浅层、不一致的步骤)和过度思考(重复、冗长的推理)之间摇摆。在本工作中,我们研究了推理轨迹的结构,并发现了与特定认知行为(如验证和回溯)相关的专用注意力头。通过在推理时对这些头进行轻微干预,我们可以引导模型远离低效模式。基于这一发现,我们提出了CREST,一种用于测试时认知推理引导的无训练方法。CREST包含两个部分:(1)离线校准步骤,用于识别认知头并推导出头特定的引导向量;(2)推理时过程,通过旋转隐藏表示来抑制沿这些向量的分量。CREST自适应地抑制非生产性推理行为,从而同时提高准确性和降低计算成本。在多样化的推理基准测试和模型中,CREST将准确率提升了高达17.5%,同时减少了37.6%的令牌使用量,为更快速、更可靠的LLM推理提供了一条简单有效的途径。