Recent generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose, simply by learning from unlabeled image collections. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. Using a pretrained generator, we first find the latent code corresponding to a given real input image. Applying perturbations to the code creates natural variations of the image, which can then be ensembled together at test-time. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars. Critically, we find that several design decisions are required towards making this process work; the perturbation procedure, weighting between the augmentations and original image, and training the classifier on synthesized images can all impact the result. Currently, we find that while test-time ensembling with GAN-based augmentations can offer some small improvements, the remaining bottlenecks are the efficiency and accuracy of the GAN reconstructions, coupled with classifier sensitivities to artifacts in GAN-generated images.
翻译:最近的基因模型可以综合模拟真实世界变化的人工图像的“ 视图”, 例如颜色或姿势的变化, 只需从未贴标签的图像收藏中学习即可 。 在这里, 我们调查这些观点是否可以应用到真实图像中, 从而有利于下游分析任务, 如图像分类 。 我们首先使用预先训练的生成器, 我们首先发现与给定真实输入图像相对应的潜在代码 。 对代码应用扰动可以产生图像的自然变异, 然后在测试时将图像混杂在一起 。 我们使用StyleGAN2 作为基因增强的来源, 并调查这个设置, 包括面部特征、 猫脸和汽车的分类任务 。 关键是, 我们发现需要几项设计决定, 来完成这一过程; 扰动程序, 放大和原始图像之间的加权, 以及合成图像的分类器都能够影响结果 。 目前, 我们发现, 在测试时与基于 GAN 的增强力相结合可以提供一些小的改进,, 剩下的瓶颈是 GAN 重建 GAN 的效率和准确性,, 以及 GAN 图像的分类敏感 。