Classifiers and generators have long been separated. We break down this separation and showcase that conventional neural network classifiers can generate high-quality images of a large number of categories, being comparable to the state-of-the-art generative models (e.g., DDPMs and GANs). We achieve this by computing the partial derivative of the classification loss function with respect to the input to optimize the input to produce an image. Since it is widely known that directly optimizing the inputs is similar to targeted adversarial attacks incapable of generating human-meaningful images, we propose a mask-based stochastic reconstruction module to make the gradients semantic-aware to synthesize plausible images. We further propose a progressive-resolution technique to guarantee fidelity, which produces photorealistic images. Furthermore, we introduce a distance metric loss and a non-trivial distribution loss to ensure classification neural networks can synthesize diverse and high-fidelity images. Using traditional neural network classifiers, we can generate good-quality images of 256$\times$256 resolution on ImageNet. Intriguingly, our method is also applicable to text-to-image generation by regarding image-text foundation models as generalized classifiers. Proving that classifiers have learned the data distribution and are ready for image generation has far-reaching implications, for classifiers are much easier to train than generative models like DDPMs and GANs. We don't even need to train classification models because tons of public ones are available for download. Also, this holds great potential for the interpretability and robustness of classifiers. Project page is at \url{https://classifier-as-generator.github.io/}.
翻译:我们打破了这一分解,并展示了常规神经网络分类师能够产生大量类别的高质量图像,与最先进的基因化模型(如DDPMs和GANs)相仿。我们通过计算分类损失功能的部分衍生物来达到这一目的,以优化输入来生成图像。由于人们广泛知道直接优化投入与无法生成具有人意义的图像的定向对立攻击相似,我们提议了一个基于掩码的智能重建模块,使梯度的语义意识能够合成貌似图像。我们进一步提出了渐进式解析技术,以保证真实性(如DDPMs和GANs)模型。此外,我们引入了远程计量损失和非微量分配损失,以确保分类神经网络能够合成多样性和高知性图像。由于使用传统的神经网络模型分类,我们可以为图像网络生成256美元/doltime的高质量潜在数据解析。我们的方法在图像网络上甚至可以使梯度-义性认知合成图像合成工具合成。我们的方法也非常容易被应用到文本-直观模型的模型。