Recently there has been an explosion in the use of Deep Learning (DL) methods for medical image segmentation. However the field's reliability is hindered by the lack of a common base of reference for accuracy/performance evaluation and the fact that previous research uses different datasets for evaluation. In this paper, an extensive comparison of DL models for lung and COVID-19 lesion segmentation in Computerized Tomography (CT) scans is presented, which can also be used as a benchmark for testing medical image segmentation models. Four DL architectures (Unet, Linknet, FPN, PSPNet) are combined with 25 randomly initialized and pretrained encoders (variations of VGG, DenseNet, ResNet, ResNext, DPN, MobileNet, Xception, Inception-v4, EfficientNet), to construct 200 tested models. Three experimental setups are conducted for lung segmentation, lesion segmentation and lesion segmentation using the original lung masks. A public COVID-19 dataset with 100 CT scan images (80 for train, 20 for validation) is used for training/validation and a different public dataset consisting of 829 images from 9 CT scan volumes for testing. Multiple findings are provided including the best architecture-encoder models for each experiment as well as mean Dice results for each experiment, architecture and encoder independently. Finally, the upper bounds improvements when using lung masks as a preprocessing step or when using pretrained models are quantified. The source code and 600 pretrained models for the three experiments are provided, suitable for fine-tuning in experimental setups without GPU capabilities.
翻译:最近,Deep Learning (DL) 医疗图象分割法的使用在600个医学图象分割法的使用方面出现了爆炸性的变化,然而,由于缺乏一个用于准确性/业绩评价的共同参照基准,以及先前的研究使用不同的数据集进行评价,外地的可靠性受到阻碍。在本文件中,对计算机化地形(CT)扫描中肺部和COVID-19分红分红DL模型的肺部和COVID-19分红分红模型进行了广泛的比较,这些模型也可以用作测试医疗图象分层模型的基准。四种DL结构(Unet、Linknet、FPN、PSPNet)与25个随机初始化和预先训练的编码模型(VGG、DenseNet、ResNet、ResNext、DPN、MopNet、Xceptionion、Inception、Inception-V4、PevalativeNet等变量变量的变异位模型)结合在一起,有四种DVID-19级模型(80个,20个用于用于精细的精选和精选的精选的精选)的精度模型。