Do GANs replicate training images? Previous studies have shown that GANs do not seem to replicate training data without significant change in the training procedure. This leads to a series of research on the exact condition needed for GANs to overfit to the training data. Although a number of factors has been theoretically or empirically identified, the effect of dataset size and complexity on GANs replication is still unknown. With empirical evidence from BigGAN and StyleGAN2, on datasets CelebA, Flower and LSUN-bedroom, we show that dataset size and its complexity play an important role in GANs replication and perceptual quality of the generated images. We further quantify this relationship, discovering that replication percentage decays exponentially with respect to dataset size and complexity, with a shared decaying factor across GAN-dataset combinations. Meanwhile, the perceptual image quality follows a U-shape trend w.r.t dataset size. This finding leads to a practical tool for one-shot estimation on minimal dataset size to prevent GAN replication which can be used to guide datasets construction and selection.
翻译:GANs 复制培训图像吗? 以前的研究表明,GANs 似乎不会复制培训数据,但培训程序没有发生重大变化。这导致一系列研究,研究GANs需要哪些确切条件才能与培训数据超配。虽然在理论上或经验上已经查明了一些因素,但数据集大小和复杂性对GANs复制的影响仍然未知。BigGAN和SteleGAN2对CelibA、Flower和LSUN卧室数据集的经验证据显示,数据集大小及其复杂性在GANs复制和生成图像的感知质量方面起着重要作用。我们进一步量化了这种关系,发现复制的百分比在数据集大小和复杂性方面急剧衰减,同时在GAN-数据集组合中有一个共同的衰减系数。与此同时,概念图像质量遵循Ushape趋势 w.r.t数据集大小。这一发现导致一个实用工具,用于对最小的数据集大小进行一线估计,以防止GAN复制用于指导数据集的构建和选择。