Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, developing a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially in deprived areas. To address this, here we present a novel deep learning framework that uses knowledge distillation through self-supervised learning and self-training, which shows that the performance of the original model trained with a small number of labels can be gradually improved with more unlabeled data. Experimental results show that the proposed framework maintains impressive robustness against a real-world environment and has general applicability to several diagnostic tasks such as tuberculosis, pneumothorax, and COVID-19. Notably, we demonstrated that our model performs even better than those trained with the same amount of labeled data. The proposed framework has a great potential for medical imaging, where plenty of data is accumulated every year, but ground truth annotations are expensive to obtain.
翻译:虽然深层次的基于学习的计算机辅助诊断系统最近取得了专家一级的业绩,但开发一个强有力的深层次学习模式需要大量高质量的数据,并配有人工说明,而这些数据成本很高。这种情况造成了一个问题,即由于专家,特别是贫困地区的专家缺乏人工标签,每年在医院收集的胸部X光无法使用。为了解决这个问题,我们在这里提出了一个新的深层次学习框架,通过自我监督的学习和自我培训来利用知识蒸馏,这表明用少量标签培训的原始模型的性能可以随着更多没有标签的数据而逐步改善。实验结果表明,拟议的框架对现实世界环境保持了令人印象深刻的强健性,对结核病、肺炎球菌和COVID-19等几项诊断任务具有普遍适用性。值得注意的是,我们证明我们的模型比那些受过同样数量标签数据培训的人表现得更好。拟议的框架在医学成像方面有很大的潜力,因为在那里每年积累大量数据,但地面的真相图则昂贵。