Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Artificial intelligence has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. However, the machine learning models used to build these tools are highly dependent on the data used to train them. Large amounts of data can be difficult to obtain in medicine due to privacy, expensive and time-consuming annotations, and a general lack of data samples for infrequent lesions. Here, we present a novel synthetic data generation pipeline, called SinGAN-Seg, to produce synthetic medical images with corresponding masks using a single training image. Our method is different from the traditional GANs because our model needs only a single image and the corresponding ground truth to train. Our method produces alternative artificial segmentation datasets with ground truth masks when real datasets are not allowed to share. The pipeline is evaluated using qualitative and quantitative comparisons between real and synthetic data to show that the style transfer technique used in our pipeline significantly improves the quality of the generated data and our method is better than other state-of-the-art GANs to prepare synthetic images when the size of training datasets are limited. By training UNet++ using both real and the synthetic data generated from the SinGAN-Seg pipeline, we show that models trained with synthetic data have very close performances to those trained on real data when the datasets have a considerable amount of data. In contrast, Synthetic data generated from the SinGAN-Seg pipeline can improve the performance of segmentation models when training datasets do not have a considerable amount of data. The code is available on GitHub.
翻译:分析医疗数据以发现异常现象是一项耗时和昂贵的任务,对于罕见的异常情况来说尤其如此,这需要医学专家作出巨大努力。人工智能已成为一个常用的工具,用于自动处理医疗数据,作为医生的一种辅助工具。然而,用于建立这些工具的机器学习模型高度依赖用于培训这些数据的数据。由于隐私、昂贵和耗时的注释,以及由于对真实和合成数据进行定性和定量的比较,在医学领域很难获得大量数据,因此不常见的病变数据样本普遍缺乏。在这里,我们展示了一个新的合成数据生成管道,叫做SinGAN-Seg, 用来用单一的培训图像生成相应的面具制作合成医学图像。我们的方法与传统的GAN不同,因为我们的模型只需要一个单一图像和相应的地面真相来培训。我们的方法产生了替代的人工分解数据集,在不允许真实数据集共享的情况下,用真实和合成数据进行比较,以表明我们管道中所使用的风格传输技术大大改进了生成的数据质量,而我们的方法比其他经过培训的GAN级数据数量要好。当我们经过培训的GAN数据在经过培训后,当我们经过培训的SAN数据生成的合成数据数量都能够展示时,那么,当我们用经过精化的GAN数据数量时,当我们经过训练的合成数据在经过训练的合成数据中的数据中的数据数量时,这种数据能够的合成数据数量比这些数据在经过了比都比较得更精确的合成数据质量数据数量。