Purpose: Accurate segmentation of lung and infection in COVID-19 CT scans plays an important role in the quantitative management of patients. Most of the existing studies are based on large and private annotated datasets that are impractical to obtain from a single institution, especially when radiologists are busy fighting the coronavirus disease. Furthermore, it is hard to compare current COVID-19 CT segmentation methods as they are developed on different datasets, trained in different settings, and evaluated with different metrics. Methods: To promote the development of data-efficient deep learning methods, in this paper, we built three benchmarks for lung and infection segmentation based on 70 annotated COVID-19 cases, which contain current active research areas, e.g., few-shot learning, domain generalization, and knowledge transfer. For a fair comparison among different segmentation methods, we also provide standard training, validation and testing splits, evaluation metrics and, the corresponding code. Results: Based on the state-of-the-art network, we provide more than 40 pre-trained baseline models, which not only serve as out-of-the-box segmentation tools but also save computational time for researchers who are interested in COVID-19 lung and infection segmentation. We achieve average Dice Similarity Coefficient (DSC) scores of 97.3\%, 97.7\%, and 67.3\% and average Normalized Surface Dice (NSD) scores of 90.6\%, 91.4\%, and 70.0\% for left lung, right lung, and infection, respectively. Conclusions: To the best of our knowledge, this work presents the first data-efficient learning benchmark for medical image segmentation and the largest number of pre-trained models up to now. All these resources are publicly available, and our work lays the foundation for promoting the development of deep learning methods for efficient COVID-19 CT segmentation with limited data.
翻译:目的: 在 COVID-19 CT 扫描中准确分解肺部和感染在病人的定量管理中起着重要作用。大多数现有研究基于大型和私人附加说明的数据集,这些数据集无法从一个机构获取,特别是当放射学家忙于防治冠状病毒疾病时。此外,很难比较目前的COVID-19 CT 分解方法,因为这些方法是在不同的数据集上开发的,在不同环境中培训并用不同的度量进行评估。方法:为了促进数据高效深入学习方法的发展,我们在本文中根据70个附加的COVID-19 案例建立了三个肺部和感染分解基准,这些案例包括当前活跃的研究领域,例如:少发学习、广域化和知识转移。为了对不同的分解方法进行公平的比较,我们还提供标准的培训、验证和测试分解、评价指标和相应的代码。结果:根据目前最先进的网络,我们提供了40多个经过培训的基线模型,这些模型不仅用于超过90-19级分级的深度数据,而且用于研究人员平均分解的分解工具。