Computed tomography (CT) has been widely explored as a COVID-19 screening and assessment tool to complement RT-PCR testing. To assist radiologists with CT-based COVID-19 screening, a number of computer-aided systems have been proposed. However, many proposed systems are built using CT data which is limited in both quantity and diversity. Motivated to support efforts in the development of machine learning-driven screening systems, we introduce COVIDx CT-3, a large-scale multinational benchmark dataset for detection of COVID-19 cases from chest CT images. COVIDx CT-3 includes 431,205 CT slices from 6,068 patients across at least 17 countries, which to the best of our knowledge represents the largest, most diverse dataset of COVID-19 CT images in open-access form. Additionally, we examine the data diversity and potential biases of the COVIDx CT-3 dataset, finding that significant geographic and class imbalances remain despite efforts to curate data from a wide variety of sources.
翻译:为了协助放射学家进行基于CT的COVID-19检查,提出了若干计算机辅助系统,但许多拟议的系统都是利用数量和多样性都有限的CT数据建造的,为了支持开发机器学习驱动的筛查系统的努力,我们引进了COVIDx CT-3,这是一个大型多国基准数据集,用于从胸部CT图像中检测COVID-19病例。COVIDx CT-3包括至少17个国家的6 068名病人的431 205个CT切片,我们最了解这些切片是开放访问形式中COVID-19CT图像的最大、最多样化的数据集。此外,我们审查了COVIDx CT-3数据集的数据多样性和潜在偏差,发现尽管努力从各种来源整理数据,但仍然存在严重的地理和阶层不平衡。