使用X射线胸前图像校正半悬浮 Covid-19 探测数据Im平衡 (Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using X-ray Chest Images)

Saul Calderon-Ramirez, Shengxiang-Yang,Armaghan Moemeni,David Elizondo,Simon Colreavy-Donnelly,Luis Fernando Chavarria-Estrada,Miguel A. Molina-Cabello

from arxiv, Under journal review

The Corona Virus (COVID-19) is an internationalpandemic that has quickly propagated throughout the world. The application of deep learning for image classification of chest X-ray images of Covid-19 patients, could become a novel pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in the context of a new highly infectious disease, the datasets are also highly imbalanced,with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch using a very limited number of labelled observations and highly imbalanced labelled dataset. We propose a simple approach for correcting data imbalance, re-weight each observationin the loss function, giving a higher weight to the observationscorresponding to the under-represented class. For unlabelled observations, we propose the usage of the pseudo and augmentedlabels calculated by MixMatch to choose the appropriate weight. The MixMatch method combined with the proposed pseudo-label based balance correction improved classification accuracy by up to 10%, with respect to the non balanced MixMatch algorithm, with statistical significance. We tested our proposed approach with several available datasets using 10, 15 and 20 labelledobservations. Additionally, a new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients

翻译：科罗纳病毒(COVID-19)是一个国际大片,它在全世界迅速传播。应用深层学习对科维德-19病人胸部X射线图像进行图像分类,可以成为一种新型的诊断前检测方法。然而,深层学习结构需要大量贴标签的数据集。当研究主题与病毒爆发的情况一样相对较新时,这往往是一个限制,因为处理贴有标签的小数据集是一个挑战。此外,在新型高传染性疾病的背景下,数据集也高度失衡,新疾病阳性病例的观察很少。在此工作中,我们利用非常有限的观察和高度不平衡的标签数据集来评估被称为MixMatch的半超超深层学习结构的性能。我们提出一个简单的方法来纠正数据不平衡,对损失函数中的每个观察点进行重新加权,使观察的重比重更高,对测试不足的类别而言,在未加标签的观察中,我们建议使用由MixMatch-Match公司计算出来的假和增强的标签, 以15度的准确性数值来评估半超值的半级学习结构。我们提议采用10级数据,以10级数据为10级分类。MLM 以适当的计算,以10级计算。我们建议采用10级计算。