Large language models (LLMs) often generate hallucinated content that lacks factual or contextual grounding, limiting their reliability in critical applications. Existing approaches such as supervised fine-tuning and reinforcement learning from human feedback are data intensive and computationally expensive, while static parameter editing methods struggle with context dependent errors and catastrophic forgetting. We propose LLM-CAS, a framework that formulates real-time hallucination correction as a hierarchical reinforcement learning problem. LLM-CAS trains an agent to learn a policy that dynamically selects temporary neuron perturbations during inference based on the current context. Unlike prior dynamic approaches that rely on heuristic or predefined adjustments, this policy driven mechanism enables adaptive and fine grained correction without permanent parameter modification. Experiments across multiple language models demonstrate that LLM-CAS consistently improves factual accuracy, achieving gains of 10.98 percentage points on StoryCloze, 2.71 points on TriviaQA, and 2.06 points on the MC1 score of TruthfulQA. These results outperform both static editing methods such as ITI and CAA and the dynamic SADI framework. Overall, LLM-CAS provides an efficient and context aware solution for improving the reliability of LLMs, with promising potential for future multimodal extensions.
翻译:大型语言模型(LLM)常生成缺乏事实或语境依据的幻觉内容,限制了其在关键应用中的可靠性。现有方法如监督微调和基于人类反馈的强化学习需要大量数据且计算成本高昂,而静态参数编辑方法难以处理语境依赖型错误并易引发灾难性遗忘。本文提出LLM-CAS框架,将实时幻觉校正建模为分层强化学习问题。LLM-CAS训练智能体学习一种策略,该策略能在推理过程中根据当前语境动态选择临时神经元扰动。与依赖启发式或预定义调整的既有动态方法不同,这种策略驱动机制可在不永久修改参数的情况下实现自适应、细粒度的校正。在多个语言模型上的实验表明,LLM-CAS能持续提升事实准确性:在StoryCloze上获得10.98个百分点的提升,在TriviaQA上提升2.71个百分点,在TruthfulQA的MC1分数上提升2.06个百分点。这些结果优于ITI、CAA等静态编辑方法以及动态SADI框架。总体而言,LLM-CAS为提升LLM的可靠性提供了一种高效且语境感知的解决方案,并具备未来向多模态扩展的潜力。