Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM. Experimental validation highlights that, for a given number of training samples, our set covering selection is a better strategy than selecting random pipelines or using all the available pipelines. Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity. Finally, different benchmarks for classical and wild databases show the good generalization property of the extracted databases. Additional resources are available at github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering.
翻译:在一个操作框架内,一个精密的制图员所使用的范围很可能来自不同的传感器和不同的加工管道,而研究人员用于培训其分流模型的管道则不同。因此,在超出分发范围时,业绩差距是不可避免的,这是非常常见的情景,称为“封面源误差”(CSM)。在这里,我们探索一个加工管道网格,以研究CSM的起源,更好地了解它,并更好地解决这一问题。使用一套覆盖的贪婪算法选择具有代表性的管道,最大限度地减少代表与集成内管道之间的最大遗憾。我们的主要贡献是建立能够解决操作 CSM 操作的相关基地的方法。实验性验证强调,对于特定数量的培训样本来说,我们所选择的涵盖范围比选择随机管道或使用所有可用管道更好的战略。我们的分析还表明,分解、精细和下层抽样等参数对于促进多样性非常重要。最后,古典和野生数据库的不同基准显示了所提取数据库的良好一般化属性。在Guthhub.com/Ronybechanial-Annal-Symagistria-Syal-Annaligistria.