The scientific image integrity area presents a challenging research bottleneck, the lack of available datasets to design and evaluate forensic techniques. Its data sensitivity creates a legal hurdle that prevents one to rely on real tampered cases to build any sort of accessible forensic benchmark. To mitigate this bottleneck, we present an extendable open-source library that reproduces the most common image forgery operations reported by the research integrity community: duplication, retouching, and cleaning. Using this library and realistic scientific images, we create a large scientific forgery image benchmark (39,423 images) with an enriched ground-truth. In addition, concerned about the high number of retracted papers due to image duplication, this work evaluates the state-of-the-art copy-move detection methods in the proposed dataset, using a new metric that asserts consistent match detection between the source and the copied region. The dataset and source-code will be freely available upon acceptance of the paper.
翻译:科学图像完整性领域是一个具有挑战性的研究瓶颈,缺乏可用于设计和评估法医技术的可用数据集。它的数据敏感性造成了法律障碍,使人们无法依赖实际篡改的案件来建立任何可获取的法医基准。为了减轻这一瓶颈,我们提出了一个可扩展的开放源图书馆,该图书馆复制了研究完整性界报告的最常见的图像伪造作业:重复、重新触摸和清洁。我们利用这个图书馆和现实的科学图像,建立了一个大型科学伪造图像基准(39 423张图像),并丰富了地面真相。此外,由于担心图像重复,被收回的论文数量很大,这项工作评估了拟议数据集中最先进的复制移动探测方法,采用了新的指标,显示源与复制区域之间的一致性检测。一旦接受该文档,将可自由获取数据集和源代码。