CoPRS：从思维链中学习位置先验以实现推理分割 (CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation)

Existing works on reasoning segmentation either connect hidden features from a language model directly to a mask decoder or represent positions in text, which limits interpretability and semantic detail. To solve this, we present CoPRS, a Multi-modal Chain-of-Thought (MCoT)-based positional perception model that bridges language reasoning to segmentation through a differentiable and interpretable positional prior instantiated as a heatmap. By making the reasoning process clear via MCoT and expressing it as a dense, differentiable heatmap, this interface enhances interpretability and diagnostic analysis and yields more concentrated evidence on the target. A learnable concentration token aggregates features of the image and reasoning text to generate this positional prior, which is decoded to precise masks through a lightweight decoder, providing a direct connection between reasoning and segmentation. Across the RefCOCO series and ReasonSeg, CoPRS matches or surpasses the best reported metrics on each standard split under comparable protocols, with performance at or above prior state of the art across both validation and test partitions. Extensive experiments reveal that the quality of the heatmap strongly influences the resulting mask quality, supporting a consistent association between the reasoning output and downstream mask generation. Collectively, these findings support the utility of this paradigm in bridging reasoning and segmentation and show advantages in concentration driven by reasoning and predicting masks more precisely. Code, checkpoints and logs are released at https://github.com/ZhenyuLU-Heliodore/CoPRS.git.

翻译：现有推理分割研究要么将语言模型的隐藏特征直接连接到掩码解码器，要么以文本形式表示位置，这限制了可解释性和语义细节。为解决此问题，我们提出CoPRS——一种基于多模态思维链（MCoT）的位置感知模型，它通过可微分且可解释的热图形式实例化的位置先验，将语言推理与分割任务相连接。通过MCoT使推理过程清晰化，并将其表达为稠密可微的热图，该接口增强了可解释性与诊断分析能力，并在目标区域产生更集中的证据。一个可学习的聚焦令牌聚合图像与推理文本的特征以生成此位置先验，随后通过轻量级解码器将其解码为精确掩码，从而建立推理与分割间的直接关联。在RefCOCO系列数据集和ReasonSeg上，CoPRS在可比协议下于各标准划分中达到或超越了最佳报告指标，在验证集和测试集上的性能均达到或优于先前最优水平。大量实验表明，热图质量对最终掩码质量具有显著影响，这支持了推理输出与下游掩码生成之间存在稳定关联的结论。总体而言，这些发现验证了该范式在连接推理与分割任务中的有效性，并显示出其在推理驱动的目标聚焦和更精准掩码预测方面的优势。代码、检查点及日志已发布于 https://github.com/ZhenyuLU-Heliodore/CoPRS.git。