面向开放世界场景的零样本息肉检测自适应检测器-验证器框架 (Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings)

Polyp detectors trained on clean datasets often underperform in real-world endoscopy, where illumination changes, motion blur, and occlusions degrade image quality. Existing approaches struggle with the domain gap between controlled laboratory conditions and clinical practice, where adverse imaging conditions are prevalent. In this work, we propose AdaptiveDetector, a novel two-stage detector-verifier framework comprising a YOLOv11 detector with a vision-language model (VLM) verifier. The detector adaptively adjusts per-frame confidence thresholds under VLM guidance, while the verifier is fine-tuned with Group Relative Policy Optimization (GRPO) using an asymmetric, cost-sensitive reward function specifically designed to discourage missed detections -- a critical clinical requirement. To enable realistic assessment under challenging conditions, we construct a comprehensive synthetic testbed by systematically degrading clean datasets with adverse conditions commonly encountered in clinical practice, providing a rigorous benchmark for zero-shot evaluation. Extensive zero-shot evaluation on synthetically degraded CVC-ClinicDB and Kvasir-SEG images demonstrates that our approach improves recall by 14 to 22 percentage points over YOLO alone, while precision remains within 0.7 points below to 1.7 points above the baseline. This combination of adaptive thresholding and cost-sensitive reinforcement learning achieves clinically aligned, open-world polyp detection with substantially fewer false negatives, thereby reducing the risk of missed precancerous polyps and improving patient outcomes.

翻译：在洁净数据集上训练的息肉检测器通常在实际内窥镜场景中表现不佳，因为光照变化、运动模糊和遮挡会降低图像质量。现有方法难以应对受控实验室条件与临床实践之间的领域差距，后者普遍存在不利的成像条件。本研究提出AdaptiveDetector，一种新颖的两阶段检测器-验证器框架，包含YOLOv11检测器与视觉语言模型（VLM）验证器。检测器在VLM引导下自适应调整逐帧置信度阈值，而验证器则通过组相对策略优化（GRPO）进行微调，采用专门设计的不对称成本敏感奖励函数以抑制漏检——这是关键的临床需求。为在挑战性条件下实现真实评估，我们通过系统性地用临床实践中常见的不利条件降解洁净数据集，构建了综合性合成测试平台，为零样本评估提供了严格基准。在合成降解的CVC-ClinicDB和Kvasir-SEG图像上进行的大量零样本评估表明，相较于单独使用YOLO，我们的方法将召回率提升了14至22个百分点，同时精确度保持在基线以下0.7个点到以上1.7个点的范围内。这种自适应阈值调整与成本敏感强化学习的结合，实现了临床对齐的开放世界息肉检测，显著减少了假阴性，从而降低了癌前息肉漏检风险并改善了患者预后。