Classifiers and generators have long been separated. We break down this separation and showcase that conventional neural network classifiers can generate high-quality images of a large number of categories, being comparable to the state-of-the-art generative models (e.g., DDPMs and GANs). We achieve this by computing the partial derivative of the classification loss function with respect to the input to optimize the input to produce an image. Since it is widely known that directly optimizing the inputs is similar to targeted adversarial attacks incapable of generating human-meaningful images, we propose a mask-based stochastic reconstruction module to make the gradients semantic-aware to synthesize plausible images. We further propose a progressive-resolution technique to guarantee fidelity, which produces photorealistic images. Furthermore, we introduce a distance metric loss and a non-trivial distribution loss to ensure classification neural networks can synthesize diverse and high-fidelity images. Using traditional neural network classifiers, we can generate good-quality images of 256$\times$256 resolution on ImageNet. Intriguingly, our method is also applicable to text-to-image generation by regarding image-text foundation models as generalized classifiers. Proving that classifiers have learned the data distribution and are ready for image generation has far-reaching implications, for classifiers are much easier to train than generative models like DDPMs and GANs. We don't even need to train classification models because tons of public ones are available for download. Also, this holds great potential for the interpretability and robustness of classifiers.
翻译:我们打破了这一分解,并展示了常规神经网络分类师能够产生大量类别高品质图像,与最先进的基因化模型(如DDPMs和GANs)相仿。我们通过计算分类损失功能的部分衍生物,以优化输入优化输入生成图像。由于人们广泛知道直接优化投入与无法生成具有人意义的图像的有针对性的对抗性攻击相似,我们提议了一个基于掩码的智能重建模块,以使梯度具有可靠图像合成的识别能力。我们进一步建议了一种渐进式分辨率技术,以保证真实性,从而产生具有摄影现实性的图像。此外,我们引入了远程计量损失和非微量分布损失,以确保分类神经网络能够合成多样性和高知性图像。由于使用传统的神经网络模型,我们可以生成256美元和256美元的潜在分辨率,我们在图像网络上可以生成高质量的图像质量图像。 奇怪的是,我们的方法,甚至具有合成图像合成的精度的精度,我们的方法也适用于高清晰度的版本的版本模型。