Binary analyses based on deep neural networks (DNNs), or neural binary analyses (NBAs), have become a hotly researched topic in recent years. DNNs have been wildly successful at pushing the performance and accuracy envelopes in the natural language and image processing domains. Thus, DNNs are highly promising for solving binary analysis problems that are typically hard due to a lack of complete information resulting from the lossy compilation process. Despite this promise, it is unclear that the prevailing strategy of repurposing embeddings and model architectures originally developed for other problem domains is sound given the adversarial contexts under which binary analysis often operates. In this paper, we empirically demonstrate that the current state of the art in neural function boundary detection is vulnerable to both inadvertent and deliberate adversarial attacks. We proceed from the insight that current generation NBAs are built upon embeddings and model architectures intended to solve syntactic problems. We devise a simple, reproducible, and scalable black-box methodology for exploring the space of inadvertent attacks - instruction sequences that could be emitted by common compiler toolchains and configurations - that exploits this syntactic design focus. We then show that these inadvertent misclassifications can be exploited by an attacker, serving as the basis for a highly effective black-box adversarial example generation process. We evaluate this methodology against two state-of-the-art neural function boundary detectors: XDA and DeepDi. We conclude with an analysis of the evaluation data and recommendations for how future research might avoid succumbing to similar attacks.
翻译:基于深神经网络(DNNS)或神经二进制分析(NBAs)的二进制分析,近年来已成为一个热热研究的主题。 DNS在自然语言和图像处理域推展性能和精确信封方面非常成功。 因此, DNNS对于解决二进制分析问题非常有希望,因为由于损失汇编过程缺乏完整的信息,通常很难解决二进制分析问题。尽管如此,鉴于为其他问题域最初开发的嵌入和模型结构(NBAs)的流行战略在不断运行的二进制分析背景之下,它还是很有道理的。 在本文件中,我们实验性地表明,神经功能边界探测的当前艺术状态很容易受到不小心和故意的对抗性攻击。我们从这样的洞察觉出发,即目前的NBA是建立在旨在解决合成问题的嵌入式和模型结构之上的。我们设计了一个简单、可复制和可缩放的黑箱方法,用于探索不准备攻击的空间-我们可以通过共同的编译方法来避免进行黑进式的电路序列序列分析,我们可以通过这些设计方法来分析。