An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
翻译:越来越多的公共数据集显示对解剖结构的评估具有明显的临床影响,然而,每个数据集都是小的、部分标签的,很少调查严重的肿瘤。此外,目前的模型仅限于分割特定器官/图象,不能扩大到新的领域和类别。为了解决这些限制,我们采用从不同语言-图象学前培训(CLIP)中学到的分解模型的嵌入,称为CLIP-Driven通用模型。通用模型通过利用不同地点之间的语义关系,可以改进25个器官和6类肿瘤的分部分。该模型是从14个数据集的组合中开发出来的,其中3,410个CT扫描无法扩大到新领域和新类别。为了解决这些限制,我们引入了在Cranal Vault(BTCV)之后所学到的截断模型最新结果。与特定模型相比,通用模型的计算效率更高(6x),比不同地点的CT扫描要好得多,一般化为不同地点之间的语义性谱扫描,并显示更强有力地将模型升级到老化的阶段。