Diffusion models emerge to establish the new state of the art in the visual generation. In particular, text-to-image diffusion models that generate images based on caption descriptions have attracted increasing attention, impressed by their user controllability. Despite encouraging performance, they exaggerate concerns of fake image misuse and cast new pressures on fake image detection. In this work, we pioneer a systematic study of the authenticity of fake images generated by text-to-image diffusion models. In particular, we conduct comprehensive studies from two perspectives unique to the text-to-image model, namely, visual modality and linguistic modality. For visual modality, we propose universal detection that demonstrates fake images of these text-to-image diffusion models share common cues, which enable us to distinguish them apart from real images. We then propose source attribution that reveals the uniqueness of the fingerprints held by each diffusion model, which can be used to attribute each fake image to its model source. A variety of ablation and analysis studies further interpret the improvements from each of our proposed methods. For linguistic modality, we delve deeper to comprehensively analyze the impacts of text captions (called prompt analysis) on the image authenticity of text-to-image diffusion models, and reason the impacts to the detection and attribution performance of fake images. All findings contribute to the community's insight into the natural properties of text-to-image diffusion models, and we appeal to our community's consideration on the counterpart solutions, like ours, against the rapidly-evolving fake image generators.
翻译:在视觉生成中,出现了新的融合模型,以建立新的艺术状态。特别是,根据字幕描述生成图像的文本到图像的图像传播模型吸引了越来越多的关注,其用户控制能力给它们留下了深刻的印象。尽管表现令人鼓舞,但它们夸大了对伪造图像误用的关切,并对假图像检测施加新的压力。在这项工作中,我们率先对文本到图像传播模型产生的假图像的真实性进行了系统研究。特别是,我们从文本到图像模型独特的两个角度,即视觉模式和语言模式,进行全面研究。关于视觉模式,我们建议进行普遍检测,展示这些文本到图像扩散模型的假图像,展示这些文本到图像的假图像的假图像,这使我们能够区分它们与真实图像分开。我们然后提出来源归属,显示每个扩散模型所持有的指纹的独特性,这些指纹可以用来将每个假图像归到模型的模型的模型来源。各种联系和分析研究进一步解释我们拟议方法的改进情况。关于语言模式,我们更深入地分析文本说明(即快速分析)对文本图像的对应模型的影响,让我们的图像真实性分析,将所有文本真实性、文字到图像的图像的检测到自然特性的特性,将所有文本到我们对正感的特性的特性的特性的特性,我们对正反向感的特性的特性的推。