Deep learning models frequently suffer from various problems such as class imbalance and lack of robustness to distribution shift. It is often difficult to find data suitable for training beyond the available benchmarks. This is especially the case for computer vision models. However, with the advent of Generative Adversarial Networks (GANs), it is now possible to generate high-quality synthetic data. This synthetic data can be used to alleviate some of the challenges faced by deep learning models. In this work we present a detailed analysis of the effect of training computer vision models using different proportions of synthetic data along with real (organic) data. We analyze the effect that various quantities of synthetic data, when mixed with original data, can have on a model's robustness to out-of-distribution data and the general quality of predictions.
翻译:深层学习模型经常遇到诸如班级不平衡和分配转移缺乏稳健性等各种问题,往往难以找到在现有基准之外适合培训的数据,计算机视觉模型尤其如此,然而,随着General Adversarial Networks(GANs)的出现,现在有可能生成高质量的合成数据。这种合成数据可用于缓解深层学习模型面临的一些挑战。在这项工作中,我们详细分析了使用不同比例的合成数据以及真实(有机)数据培训计算机视觉模型的影响。我们分析了各种合成数据,如果与原始数据相结合,对模型对传播外的数据的稳健性和预测的总体质量会产生何种影响。</s>