Colorectal cancer (CRC) is a significant global health concern, and early detection through screening plays a critical role in reducing mortality. While deep learning models have shown promise in improving polyp detection, classification, and segmentation, their generalization across diverse clinical environments, particularly with out-of-distribution (OOD) data, remains a challenge. Multi-center datasets like PolypGen have been developed to address these issues, but their collection is costly and time-consuming. Traditional data augmentation techniques provide limited variability, failing to capture the complexity of medical images. Diffusion models have emerged as a promising solution for generating synthetic polyp images, but the image generation process in current models mainly relies on segmentation masks as the condition, limiting their ability to capture the full clinical context. To overcome these limitations, we propose a Progressive Spectrum Diffusion Model (PSDM) that integrates diverse clinical annotations-such as segmentation masks, bounding boxes, and colonoscopy reports-by transforming them into compositional prompts. These prompts are organized into coarse and fine components, allowing the model to capture both broad spatial structures and fine details, generating clinically accurate synthetic images. By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation. For instance, on the PolypGen dataset, PSDM increases the F1 score by 2.12% and the mean average precision by 3.09%, demonstrating superior performance in OOD scenarios and enhanced generalization.
翻译:结直肠癌是全球重大健康问题,通过筛查实现早期检测对降低死亡率至关重要。尽管深度学习模型在改善息肉检测、分类与分割方面展现出潜力,但其在不同临床环境中的泛化能力,尤其是在分布外数据场景下,仍面临挑战。PolypGen等多中心数据集的建立旨在解决这些问题,但其采集过程成本高昂且耗时。传统数据增强技术提供的多样性有限,难以捕捉医学图像的复杂性。扩散模型已成为生成合成息肉图像的有效方案,但现有模型的图像生成过程主要依赖分割掩码作为条件,限制了其捕捉完整临床背景的能力。为突破这些限制,我们提出一种渐进式频谱扩散模型,通过将分割掩码、边界框及结肠镜检查报告等多模态临床标注转化为组合提示进行整合。这些提示被组织为粗粒度与细粒度组件,使模型既能捕捉宏观空间结构又能保留细节特征,从而生成临床精准的合成图像。通过使用PSDM生成的样本增强训练数据,我们的模型在息肉检测、分类与分割任务上均取得显著提升。例如在PolypGen数据集上,PSDM将F1分数提升2.12%,平均精度均值提高3.09%,在分布外场景中展现出卓越性能与增强的泛化能力。