This paper is responding to the MIA-COV19 challenge to classify COVID from non-COVID based on CT lung images. The COVID-19 virus has devastated the world in the last eighteen months by infecting more than 182 million people and causing over 3.9 million deaths. The overarching aim is to predict the diagnosis of the COVID-19 virus from chest radiographs, through the development of explainable vision transformer deep learning techniques, leading to population screening in a more rapid, accurate and transparent way. In this competition, there are 5381 three-dimensional (3D) datasets in total, including 1552 for training, 374 for evaluation and 3455 for testing. While most of the data volumes are in axial view, there are a number of subjects' data are in coronal or sagittal views with 1 or 2 slices are in axial view. Hence, while 3D data based classification is investigated, in this competition, 2D images remains the main focus. Two deep learning methods are studied, which are vision transformer (ViT) based on attention models and DenseNet that is built upon conventional convolutional neural network (CNN). Initial evaluation results based on validation datasets whereby the ground truth is known indicate that ViT performs better than DenseNet with F1 scores being 0.76 and 0.72 respectively. Codes are available at GitHub at <https://github/xiaohong1/COVID-ViT>.
翻译:本文回应了MIA-COV19根据CT肺部图像将COVID从非COVID分类的挑战。COVID-19病毒在过去18个月里通过感染了1.82亿人口并造成390多万人死亡,摧毁了世界。总体目标是通过开发可解释的视觉变压器深层次学习技术,从胸部射线图中预测COVID-19病毒的诊断,从而以更快、准确和透明的方式对人口进行筛选。在这一竞争中,共有5381个三维(3D)数据集,其中包括1552个用于培训的数据集,374个用于评估的数据集,3455个用于测试的数据集。虽然大多数数据量处于轴线性视图中,但有一些主题数据在直角或直角视图中,1至2个片片位。因此,对基于分类的3D数据进行了调查,在这一竞争中,2D图像仍然是主要焦点。正在研究两种深层次的学习方法,这是基于关注模型和Dense-DNet的视觉变压器(VICN),分别以常规的GIVALD结果网络显示的GIVAL结果网络显示。