Distinguishing between computer-generated (CG) and natural photographic (PG) images is of great importance to verify the authenticity and originality of digital images. However, the recent cutting-edge generation methods enable high qualities of synthesis in CG images, which makes this challenging task even trickier. To address this issue, a joint learning strategy with deep texture and high-frequency features for CG image detection is proposed. We first formulate and deeply analyze the different acquisition processes of CG and PG images. Based on the finding that multiple different modules in image acquisition will lead to different sensitivity inconsistencies to the convolutional neural network (CNN)-based rendering in images, we propose a deep texture rendering module for texture difference enhancement and discriminative texture representation. Specifically, the semantic segmentation map is generated to guide the affine transformation operation, which is used to recover the texture in different regions of the input image. Then, the combination of the original image and the high-frequency components of the original and rendered images are fed into a multi-branch neural network equipped with attention mechanisms, which refines intermediate features and facilitates trace exploration in spatial and channel dimensions respectively. Extensive experiments on two public datasets and a newly constructed dataset with more realistic and diverse images show that the proposed approach outperforms existing methods in the field by a clear margin. Besides, results also demonstrate the detection robustness and generalization ability of the proposed approach to postprocessing operations and generative adversarial network (GAN) generated images.
翻译:区分计算机生成图像和自然摄影图像对于核实数字图像的真实性和原创性非常重要,然而,最近的尖端生成方法使得CG图像的合成质量高,使得这一具有挑战性的任务更加狡猾。为了解决这一问题,我们提出了一个具有深质质和高频特征的联合学习战略,用于检测CG图像。我们首先制定并深入分析CG和PG图像的不同获取过程。根据这一发现,在图像获取中,多个不同模块将会导致对基于图像的神经网络(CNN)的合成产生不同敏感度的不一致,我们提出一个深质质质化模块,用于增强纹质差异和歧视性的质质谱代表。具体地说,生成语义分割图,用以指导“亲近”转换操作操作,用于在输入图像的不同区域恢复纹结和高频图像的获取过程。然后,将原始图像和原始图像的高频度组件组合成一个多分支神经网络网络,配有关注机制,改进图像的中间特性,便于在新构建的空间和频道图像中跟踪探索能力,并分别以新的空间和频道数据形式显示最新的实地分析结果。