The possibility of carrying out a meaningful forensics analysis on printed and scanned images plays a major role in many applications. First of all, printed documents are often associated with criminal activities, such as terrorist plans, child pornography pictures, and even fake packages. Additionally, printing and scanning can be used to hide the traces of image manipulation or the synthetic nature of images, since the artifacts commonly found in manipulated and synthetic images are gone after the images are printed and scanned. A problem hindering research in this area is the lack of large scale reference datasets to be used for algorithm development and benchmarking. Motivated by this issue, we present a new dataset composed of a large number of synthetic and natural printed face images. To highlight the difficulties associated with the analysis of the images of the dataset, we carried out an extensive set of experiments comparing several printer attribution methods. We also verified that state-of-the-art methods to distinguish natural and synthetic face images fail when applied to print and scanned images. We envision that the availability of the new dataset and the preliminary experiments we carried out will motivate and facilitate further research in this area.
翻译:对印刷和扫描图像进行有意义的法证分析的可能性在许多应用中起着重要作用。首先,印刷文件往往与犯罪活动有关,例如恐怖主义计划、儿童色情图片,甚至假包裹。此外,印刷和扫描可以用来隐藏图像操纵的痕迹或图像的合成性质,因为在印刷和扫描图像后,通常在被操纵和合成图像中发现的文物就消失了。妨碍这方面研究的一个困难是缺乏用于算法发展和基准的大规模参考数据集。受这一问题的驱使,我们提出了一个由大量合成和自然印刷图像组成的新数据集。为了突出分析数据集图像的困难,我们进行了一系列广泛的实验,比较了几种打印机属性方法。我们还核实了在打印和扫描图像时无法辨别自然和合成面像的最先进的方法。我们设想,新的数据集和我们进行的初步实验的提供情况将激励和促进这方面的进一步研究。