The UNet model consists of fully convolutional network (FCN) layers arranged as contracting encoder and upsampling decoder maps. Nested arrangements of these encoder and decoder maps give rise to extensions of the UNet model, such as UNete and UNet++. Other refinements include constraining the outputs of the convolutional layers to discriminate between segment labels when trained end to end, a property called deep supervision. This reduces feature diversity in these nested UNet models despite their large parameter space. Furthermore, for texture segmentation, pixel correlations at multiple scales contribute to the classification task; hence, explicit deep supervision of shallower layers is likely to enhance performance. In this paper, we propose ADS UNet, a stage-wise additive training algorithm that incorporates resource-efficient deep supervision in shallower layers and takes performance-weighted combinations of the sub-UNets to create the segmentation model. We provide empirical evidence on three histopathology datasets to support the claim that the proposed ADS UNet reduces correlations between constituent features and improves performance while being more resource efficient. We demonstrate that ADS_UNet outperforms state-of-the-art Transformer-based models by 1.08 and 0.6 points on CRAG and BCSS datasets, and yet requires only 37% of GPU consumption and 34% of training time as that required by Transformers.
翻译:UNet模型由完全卷积网络(FCN)层排列组成,作为收缩编码器和上采样解码器映射。这些编码器和解码器映射的嵌套排列构成了UNete和UNet ++等UNet模型的扩展。其他改进包括限制卷积层的输出来在端到端训练时区分段落标签,这是称为深度监督的一种属性。尽管这些嵌套的UNet模型具有大的参数空间,但它们降低了这些特征的多样性。此外,对于纹理分割,多个尺度上的像素相关性有助于分类任务;因此,显式地对浅层进行深度监督可能会提高性能。
在本文中,我们提出了ADS_UNet,这是一种分阶段的加法训练算法,它结合了浅层资源高效的深度监督,并采用性能加权组合子UNet来创建分割模型。我们提供了三个组织病理学数据集的实证证据,支持所提出的ADS_UNet可以降低构成要素之间的相关性并提高性能,同时更加资源高效。我们证明,ADS_UNet在CRAG和BCSS数据集上比基于Transformer的最新模型表现更好1.08和0.6个点,却只需要37%的GPU消耗和34%的训练时间。