We introduce a new dataset called Synthetic COVID-19 Chest X-ray Dataset for training machine learning models. The dataset consists of 21,295 synthetic COVID-19 chest X-ray images to be used for computer-aided diagnosis. These images, generated via an unsupervised domain adaptation approach, are of high quality. We find that the synthetic images not only improve performance of various deep learning architectures when used as additional training data under heavy imbalance conditions, but also detect the target class with high confidence. We also find that comparable performance can also be achieved when trained only on synthetic images. Further, salient features of the synthetic COVID-19 images indicate that the distribution is significantly different from Non-COVID-19 classes, enabling a proper decision boundary. We hope the availability of such high fidelity chest X-ray images of COVID-19 will encourage advances in the development of diagnostic and/or management tools.
翻译:我们为培训机器学习模型采用了称为合成COVID-19胸前X射线数据集的新数据集,该数据集由21 295个合成COVID-19胸前X射线图像组成,用于计算机辅助诊断。这些图像是通过未经监督的领域适应方法生成的,质量很高。我们发现,合成图像不仅在严重失衡的条件下作为额外培训数据使用时,提高了各种深层学习结构的性能,而且以高度自信探测目标类别。我们还发现,只有在对合成图像进行培训时,才能取得可比的性能。此外,合成COVID-19图像的显著特征表明,其分布与非COVID-19类有显著不同,因此能够有一个适当的决定界限。我们希望,在开发诊断和(或)管理工具时,提供这种高忠实的COVID-19胸透光图像,将鼓励发展诊断和(或)管理工具。