Under the global COVID-19 crisis, developing robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that uses the corpus generated by the ResNet is not optimal for correct feature embedding. To mitigate this problem, we propose a novel vision Transformer by using the low-level CXR feature corpus that are obtained to extract the abnormal CXR features. Specifically, the backbone network is trained using large public datasets to obtain the abnormal features in routine diagnosis such as consolidation, glass-grass opacity (GGO), etc. Then, the embedded features from the backbone network are used as corpus for vision transformer training. We examine our model on various external test datasets acquired from totally different institutions to assess the generalization ability. Our experiments demonstrate that our method achieved the state-of-art performance and has better generalization capability, which are crucial for a widespread deployment.
翻译:在全球COVID-19危机下,利用CXR为COVID-19开发强力诊断算法受到阻碍,因为缺乏精确的COVID-19数据集,尽管CXR与其他疾病有关的数据很多,但CXR还存在大量数据。这种情况适用于能够利用培训前的大量无标签数据的视觉变压器结构。然而,直接使用ResNet生成的外观变压器并不最适合正确嵌入功能。为了缓解这一问题,我们建议使用低水平的CXR特性来提取反常的CXR特征,来开发新的视觉变异器。具体地说,主干网是用大型公共数据集来培训的,以获得常规诊断中的异常特征,如整合、玻璃gras Obacity (GGO)等。然后,主干网的内嵌成的外观变压器被用于进行视觉变压训练。我们研究了从完全不同的机构获得的各种外部测试数据集的模型,以评估一般化能力。我们的实验表明,我们的方法已经达到状态,并且具有更好的普及能力,这对广泛部署至关重要。