A common problem in computer vision -- particularly in medical applications -- is a lack of sufficiently diverse, large sets of training data. These datasets often suffer from severe class imbalance. As a result, networks often overfit and are unable to generalize to novel examples. Generative Adversarial Networks (GANs) offer a novel method of synthetic data augmentation. In this work, we evaluate the use of GAN- based data augmentation to artificially expand the CheXpert dataset of chest radiographs. We compare performance to traditional augmentation and find that GAN-based augmentation leads to higher downstream performance for underrepresented classes. Furthermore, we see that this result is pronounced in low data regimens. This suggests that GAN-based augmentation a promising area of research to improve network performance when data collection is prohibitively expensive.
翻译:计算机愿景 -- -- 特别是在医疗应用方面 -- -- 一个常见的问题是缺乏足够多样的、庞大的培训数据集,这些数据集往往存在严重的阶级不平衡,因此,网络往往过于完善,无法概括一些新颖的例子,因此,网络往往过于完善,无法推广到新颖的例子中。创造反向网络(GANs)提供了一种新的合成数据增强方法。在这项工作中,我们评估了GAN基数据增强的使用情况,以人为地扩大乳房Xpert的乳房射电图数据集。我们比较了基于GAN的增强性能与传统的增强性能相比,发现以GAN为基础的增强性能导致代表性不足的阶层的下游性能更高。此外,我们看到这一结果在低数据疗法中十分明显。这表明,在数据收集费用过高的情况下,基于GAN的增强性能是一个大有希望的研究领域,以改善网络性能。