Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned. We propose a two-stream Faster R-CNN network and train it endto- end to detect the tampered regions given a manipulated image. One of the two streams is an RGB stream whose purpose is to extract features from the RGB image input to find tampering artifacts like strong contrast difference, unnatural tampered boundaries, and so on. The other is a noise stream that leverages the noise features extracted from a steganalysis rich model filter layer to discover the noise inconsistency between authentic and tampered regions. We then fuse features from the two streams through a bilinear pooling layer to further incorporate spatial co-occurrence of these two modalities. Experiments on four standard image manipulation datasets demonstrate that our two-stream framework outperforms each individual stream, and also achieves state-of-the-art performance compared to alternative methods with robustness to resizing and compression.
翻译:图像操纵检测与传统的语义对象检测不同, 因为它更多地关注篡改文物而非图像内容, 这表明需要学习更丰富的特征。 我们建议使用双流快速 R- CNN 网络, 并对它进行端至端培训, 以检测被篡改的区域, 给一个被操纵的图像。 其中一条是 RGB 流, 目的是从 RGB 图像输入中提取特性, 以查找篡改文物, 比如强烈的对比差异、 不正常的篡改边界等等 。 另一条是噪音流, 利用从斯特加分析丰富模型过滤层提取的噪音特征, 以发现真实区域和被篡改区域之间的噪音不一致 。 我们随后通过双线集合层将两个流的特征连接到两端, 以进一步整合这两种模式的空间共通性 。 在四个标准图像操纵数据集上进行的实验表明, 我们的双流框架超越了每个流的尺寸, 并且实现了最先进的性表现, 与具有再校正和压缩功能的替代方法相比。