A blind spot is any input to a program that can be arbitrarily mutated without affecting the program's output. Blind spots can be used for steganography or to embed malware payloads. If blind spots overlap file format keywords, they indicate parsing bugs that can lead to differentials. This paper formalizes the operational semantics of blind spots, leading to a technique that automatically detects blind spots based on dynamic information flow tracking. An efficient implementation is introduced an evaluated against a corpus of over a thousand diverse PDFs. There are zero false-positive blind spot classifications and the missed detection rate is bounded above by 11%. On average, at least 5% of each PDF file is completely ignored by the parser. Our results show promise that this technique is an efficient automated means to detect parser bugs and differentials. Nothing in the technique is tied to PDF in general, so it can be immediately applied to other notoriously difficult-to-parse formats like ELF, X.509, and XML.
翻译:盲点是任意变异且不影响程序输出的程序的任何输入。 盲点可以用于扫描或嵌入恶意有效载荷。 如果盲点重叠文件格式关键字, 它们则表示解析错误可以导致差异。 本文将盲点的操作语义正式化, 导致一种基于动态信息流跟踪自动检测盲点的技术。 高效实施可以针对一千多个多 PDF 组合进行评价。 不存在假阳性盲点分类, 漏出检测率超过11% 。 平均而言, 每个 PDF 文件中至少有5% 被读取器完全忽略。 我们的结果表明, 这一技术是检测分析分析器错误和差异的有效自动手段。 一般来说, 技术中没有任何内容与 PDF 挂钩, 因此可以立即应用到其他臭名昭著的难到分解格式, 如 ELF、 X.509 和 XML 。