With generative models proliferating at a rapid rate, there is a growing need for general purpose fake image detectors. In this work, we first show that the existing paradigm, which consists of training a deep network for real-vs-fake classification, fails to detect fake images from newer breeds of generative models when trained to detect GAN fake images. Upon analysis, we find that the resulting classifier is asymmetrically tuned to detect patterns that make an image fake. The real class becomes a sink class holding anything that is not fake, including generated images from models not accessible during training. Building upon this discovery, we propose to perform real-vs-fake classification without learning; i.e., using a feature space not explicitly trained to distinguish real from fake images. We use nearest neighbor and linear probing as instantiations of this idea. When given access to the feature space of a large pretrained vision-language model, the very simple baseline of nearest neighbor classification has surprisingly good generalization ability in detecting fake images from a wide variety of generative models; e.g., it improves upon the SoTA by +15.07 mAP and +25.90% acc when tested on unseen diffusion and autoregressive models.
翻译:随着基因模型迅速扩散,越来越需要通用假图像检测器。在这项工作中,我们首先表明,现有的范例包括培训一个深层网络进行真Vs假图像分类,在训练检测GAN假图像时,无法检测基因模型新品种新品种的假图像。我们通过分析发现,由此产生的分类器对检测成假图像的模式进行了不对称的调整。真正的分类器变成了一个含有任何非假图像的下水道舱,包括从培训期间无法获取的模型生成的图像。根据这一发现,我们提议进行真实Vs-fake分类而不学习;即使用未经明确培训的功能空间来区分真实图像和假图像。我们使用最近的邻居和线性探测器作为这一想法的即时。当获得一个未经预先训练的大型视觉语言模型的特征空间时,最近的邻居分类的非常简单基准在从范围广泛的各种基因化模型中探测假图像方面具有令人惊讶的良好概括能力。例如,我们建议利用未受过明确训练的功能空间将真实图像与假图像区分。我们使用最近的邻居和直线性探测为这一想法的瞬间空间。当进行15.07 mccAMAP+25时,在SMA90模型上进行自我扩散时改进了。