Identifying how a file has been created is often interesting in security. It can be used by both attackers and defenders. Attackers can exploit this information to tune their attacks and defenders can understand how a malicious file has been created after an incident. In this work, we want to identify how a PDF file has been created. This problem is important because PDF files are extremely popular: many organizations publish PDF files online and malicious PDF files are commonly used by attackers. Our approach to detect which software has been used to produce a PDF file is based on coding style: given patterns that are only created by certain PDF producers. We have analyzed the coding style of 900 PDF files produced using 11 PDF producers on 3 different Operating Systems. We have obtained a set of 192 rules which can be used to identify 11 PDF producers. We have tested our detection tool on 508836 PDF files published on scientific preprints servers. Our tool is able to detect certain producers with an accuracy of 100%. Its overall detection is still high (74%). We were able to apply our tool to identify how online PDF services work and to spot inconsistency.
翻译:如何创建文件通常在安全方面很有意思。 攻击者和捍卫者都可以使用它来调和攻击者和捍卫者。 攻击者可以利用这个信息来调和攻击者的攻击, 捍卫者可以理解事件发生后如何创建恶意文件。 在此工作中, 我们想要确定一个PDF文件是如何创建的。 这个问题很重要, 因为PDF文件非常受欢迎: 许多组织在网上公布PDF文件, 攻击者通常使用恶意的PDF文件。 我们检测哪些软件用于生成PDF文件的方法基于编码样式: 某些PDF生产商只能创建的某种模式。 我们分析了在3个不同的操作系统中使用11个PDF生产者制作的900 PDF文件的编码风格。 我们获得了一套192条规则, 可以用来识别11个PDFD的生产者。 我们已经测试了我们在科学预印服务器上公布的508836 PDF文件的检测工具。 我们的工具能够以100%的准确度探测某些生产者。 它的总体检测率仍然很高( 74% )。 我们能够应用我们的工具来识别网络上的PDFS服务工作方式。