Large Vision-Language Models (LVLMs) have achieved impressive progress in multi-modal understanding and generation. However, they still tend to produce hallucinated content that is inconsistent with the visual input, which limits their reliability in real-world applications. We propose \textbf{CoFi-Dec}, a training-free decoding framework that mitigates hallucinations by integrating generative self-feedback with coarse-to-fine visual conditioning. Inspired by the human visual process from global scene perception to detailed inspection, CoFi-Dec first generates two intermediate textual responses conditioned on coarse- and fine-grained views of the original image. These responses are then transformed into synthetic images using a text-to-image model, forming multi-level visual hypotheses that enrich grounding cues. To unify the predictions from these multiple visual conditions, we introduce a Wasserstein-based fusion mechanism that aligns their predictive distributions into a geometrically consistent decoding trajectory. This principled fusion reconciles high-level semantic consistency with fine-grained visual grounding, leading to more robust and faithful outputs. Extensive experiments on six hallucination-focused benchmarks show that CoFi-Dec substantially reduces both entity-level and semantic-level hallucinations, outperforming existing decoding strategies. The framework is model-agnostic, requires no additional training, and can be seamlessly applied to a wide range of LVLMs. The implementation is available at https://github.com/AI-Researcher-Team/CoFi-Dec.
翻译:大视觉语言模型在多模态理解和生成方面取得了显著进展。然而,它们仍倾向于产生与视觉输入不一致的幻觉内容,这限制了其在现实应用中的可靠性。我们提出\textbf{CoFi-Dec},一种无需训练的解码框架,通过将生成式自反馈与从粗到细的视觉条件化相结合来缓解幻觉。受人类从全局场景感知到细节审视的视觉过程启发,CoFi-Dec首先基于原始图像的粗粒度和细粒度视图生成两个中间文本响应。随后,这些响应通过文生图模型转化为合成图像,形成多层次的视觉假设,从而丰富基础线索。为了统一来自这些多重视觉条件的预测,我们引入了一种基于Wasserstein距离的融合机制,将它们的预测分布对齐到几何一致的解码轨迹中。这种原理性融合协调了高层语义一致性与细粒度视觉基础,从而产生更稳健和忠实的输出。在六个专注于幻觉评估的基准测试上的大量实验表明,CoFi-Dec显著减少了实体级和语义级的幻觉,性能优于现有的解码策略。该框架与模型无关,无需额外训练,并可无缝应用于多种大视觉语言模型。实现代码发布于https://github.com/AI-Researcher-Team/CoFi-Dec。