A blind spot is any input to a program that can be arbitrarily mutated without affecting the program's output. Blind spots can be used for steganography or to embed malware payloads. If blind spots overlap file format keywords, they indicate parsing bugs that can lead to exploitable differentials. For example, one could craft a document that renders one way in one viewer and a completely different way in another viewer. They have also been used to circumvent code signing in Android binaries, to coerce certificate authorities to misbehave, and to execute HTTP request smuggling and parameter pollution attacks. This paper formalizes the operational semantics of blind spots, leading to a technique based on dynamic information flow tracking that automatically detects blind spots. An efficient implementation is introduced and evaluated against a corpus of over a thousand diverse PDFs parsed through MuPDF, revealing exploitable bugs in the parser. All of the blind spot classifications are confirmed to be correct and the missed detection rate is no higher than 11%. On average, at least 5% of each PDF file is completely ignored by the parser. Our results show promise that this technique is an efficient automated means to detect exploitable parser bugs, over-permissiveness and differentials. Nothing in the technique is tied to PDF in general, so it can be immediately applied to other notoriously difficult-to-parse formats like ELF, X.509, and XML.
翻译:盲点是指程序中任意可变变异而不影响程序输出的输入。盲点可用于隐写或嵌入恶意软件负载。如果盲点重叠文件格式关键字,则表示解析漏洞可能导致可利用的差异。例如,可以创建一个文档,其中在一个查看器中以一种方式呈现,在另一个查看器中则完全不同。它们还被用于绕过Android二进制文件的代码签名,迫使证书颁发机构表现不良,并执行HTTP请求走私和参数污染攻击。本文规范了盲点的操作语义,提出了一种基于动态信息流跟踪的技术来自动检测盲点。引入了高效的实现,并针对通过MuPDF解析的一千多个多样化PDF文档语料库进行了评估,揭示了解析器中可利用的漏洞。所有盲点分类都被确认为正确,错漏检率不高于11%。平均每个PDF文件至少有5%是完全被解析器忽略的内容。我们的结果显示,这种技术是一种高效的自动化手段,可检测可利用的解析器漏洞、过度权限和差异。技术中没有任何内容与PDF有关,因此可以立即应用于其他众所周知的难以解析格式,如ELF、X.509和XML。