Computed tomography (CT) has been widely explored as a COVID-19 screening and assessment tool to complement RT-PCR testing. To assist radiologists with CT-based COVID-19 screening, a number of computer-aided systems have been proposed; however, many proposed systems are built using CT data which is limited in both quantity and diversity. Motivated to support efforts in the development of machine learning-driven screening systems, we introduce COVIDx CT-3, a large-scale multinational benchmark dataset for detection of COVID-19 cases from chest CT images. COVIDx CT-3 includes 431,205 CT slices from 6,068 patients across at least 17 countries, which to the best of our knowledge represents the largest, most diverse dataset of COVID-19 CT images in open-access form. Additionally, we examine the data diversity and potential biases of the COVIDx CT-3 dataset, finding that significant geographic and class imbalances remain despite efforts to curate data from a wide variety of sources.
翻译:为了协助放射学家进行基于CT的COVID-19检查,提出了若干计算机辅助系统;然而,许多拟议的系统是使用在数量和多样性上都有限的CT数据建造的,旨在支持开发机器学习驱动的筛查系统的努力,我们引进了COVIDx CT-3,这是一个大型多国基准数据集,用于从胸部CT图像中检测COVID-19病例。COVIDx CT-3包括至少17个国家的6 068名病人的431 205个CT切片,我们最了解这些切片是开放访问形式中COVID-19CT图像的最大、最多样化的数据集。此外,我们审查了COVIDx CT-3数据集的数据多样性和潜在偏差,发现尽管努力从各种来源整理数据,但仍然存在重大的地理和阶层不平衡。