Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: if we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data. Given an off-the-shelf image generator without any access to its training data, we train representations from the samples output by this generator. We compare several representation learning methods that can be applied to this setting, using the latent space of the generator to generate multiple "views" of the same semantic content. We show that for contrastive methods, this multiview data can naturally be used to identify positive pairs (nearby in latent space) and negative pairs (far apart in latent space). We find that the resulting representations rival those learned directly from real data, but that good performance requires care in the sampling strategy applied and the training method. Generative models can be viewed as a compressed and organized copy of a dataset, and we envision a future where more and more "model zoos" proliferate while datasets become increasingly unwieldy, missing, or private. This paper suggests several techniques for dealing with visual representation learning in such a future. Code is released on our project page: https://ali-design.github.io/GenRep/
翻译:生成模型现在能够产生非常现实的图像,这些图像看起来几乎无法与培训它们所依据的数据区分。这提出了这样一个问题:如果我们拥有足够的基因模型,我们是否还需要数据集?我们在从黑箱基因模型而不是直接从数据中学习普通目的的视觉表现模型时调查这一问题。鉴于一个现成的图像生成器,没有获得任何培训数据,我们从这个生成器的样本输出中进行演示。我们比较了几种可用于这一设置的演示学习方法,利用发电机的潜在空间生成相同语义内容的多重“视图”。我们显示,对于对比性方法,这种多视图数据可以自然地用来识别正对对(隐蔽空间的近距离)和负对(隐蔽空间的遥远距离)。我们发现,由此产生的图像代表与直接从真实数据中学习的相匹配,但良好的表现需要在所应用的取样策略和培训方法中加以注意。可以将精准模型视为一个压缩和有组织的数据集的复制件,并且我们设想未来在“模型/图像显示”中,在越来越多地学习一个越来越缺少的“图像”项目,同时,我们用这种“模型/图像显示”在将来的演示中逐渐学习一个“模拟”的路径,而正在学习一种“模拟”的“模拟”的“模拟”的“模拟”项目。