The Corona Virus (COVID-19) is an internationalpandemic that has quickly propagated throughout the world. The application of deep learning for image classification of chest X-ray images of Covid-19 patients, could become a novel pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in the context of a new highly infectious disease, the datasets are also highly imbalanced,with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch using a very limited number of labelled observations and highly imbalanced labelled dataset. We propose a simple approach for correcting data imbalance, re-weight each observationin the loss function, giving a higher weight to the observationscorresponding to the under-represented class. For unlabelled observations, we propose the usage of the pseudo and augmentedlabels calculated by MixMatch to choose the appropriate weight. The MixMatch method combined with the proposed pseudo-label based balance correction improved classification accuracy by up to 10%, with respect to the non balanced MixMatch algorithm, with statistical significance. We tested our proposed approach with several available datasets using 10, 15 and 20 labelledobservations. Additionally, a new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients


翻译:科罗纳病毒(COVID-19)是一个国际大片,它在全世界迅速传播。应用深层学习对科维德-19病人胸部X射线图像进行图像分类,可以成为一种新型的诊断前检测方法。然而,深层学习结构需要大量贴标签的数据集。当研究主题与病毒爆发的情况一样相对较新时,这往往是一个限制,因为处理贴有标签的小数据集是一个挑战。此外,在新型高传染性疾病的背景下,数据集也高度失衡,新疾病阳性病例的观察很少。在此工作中,我们利用非常有限的观察和高度不平衡的标签数据集来评估被称为MixMatch的半超超深层学习结构的性能。我们提出一个简单的方法来纠正数据不平衡,对损失函数中的每个观察点进行重新加权,使观察的重比重更高,对测试不足的类别而言,在未加标签的观察中,我们建议使用由MixMatch-Match公司计算出来的假和增强的标签, 以15度的准确性数值来评估半超值的半级学习结构。我们提议采用10级数据,以10级数据为10级分类。MLM 以适当的计算,以10级计算。我们建议采用10级计算。

0
下载
关闭预览

相关内容

[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
已删除
将门创投
3+阅读 · 2019年4月12日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
【跟踪Tracking】15篇论文+代码 | 中秋快乐~
专知
18+阅读 · 2018年9月24日
disentangled-representation-papers
CreateAMind
26+阅读 · 2018年9月12日
Arxiv
4+阅读 · 2018年10月5日
Arxiv
8+阅读 · 2018年4月12日
VIP会员
相关VIP内容
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
Top
微信扫码咨询专知VIP会员