Whereas generative adversarial networks are capable of synthesizing highly realistic images of faces, cats, landscapes, or almost any other single category, paint-by-text synthesis engines can -- from a single text prompt -- synthesize realistic images of seemingly endless categories with arbitrary configurations and combinations. This powerful technology poses new challenges to the photo-forensic community. Motivated by the fact that paint by text is not based on explicit geometric or physical models, and the human visual system's general insensitivity to lighting inconsistencies, we provide an initial exploration of the lighting consistency of DALL-E-2 synthesized images to determine if physics-based forensic analyses will prove fruitful in detecting this new breed of synthetic media.
翻译:虽然基因对抗网络能够综合高度现实的面部、猫、景观或几乎所有其他单一类别的图像,但逐字漆合成引擎 -- -- 从单一的文本提示中 -- -- 可以合成看似无穷无尽的类别、任意配置和组合的现实图像。这一强大的技术对光学-法医学界提出了新的挑战。 受文字涂料并非基于明确的几何或物理模型,以及人类视觉系统对照明不一致的一般不敏感等因素的驱动,我们初步探索DALL-E-2合成图像的照明一致性,以确定基于物理的法医学分析是否会在探测这种新型合成介质方面产生成效。