Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as vulnerability search and malware analysis. However, current decompilation tools usually need lots of experts' efforts, even for years, to generate the rules for decompilation, which also requires long-term maintenance as the syntax of high-level PL or low-level PL changes. Also, an ideal decompiler should concisely generate high-level PL with similar functionality to the source low-level PL and semantic information (e.g., meaningful variable names), just like human-written code. Unfortunately, existing manually-defined rule-based decompilation techniques only functionally restore the low-level PL to a similar high-level PL and are still powerless to recover semantic information. In this paper, we propose a novel neural decompilation approach to translate low-level PL into accurate and user-friendly high-level PL, effectively improving its readability and understandability. Furthermore, we implement the proposed approaches called SEAM. Evaluations on four real-world applications show that SEAM has an average accuracy of 94.41%, which is much better than prior neural machine translation (NMT) models. Finally, we evaluate the effectiveness of semantic information recovery through a questionnaire survey, and the average accuracy is 92.64%, which is comparable or superior to the state-of-the-art compilers.
翻译:分解法将低级别程序语言(PL)(例如,二进制代码)转换为高级PL(例如,C/C+++) 。 当分析家对源代码不存在的软件(系统)进行安全分析(系统)时,它被广泛使用,例如脆弱性搜索和恶意软件分析。然而,目前的分解法工具通常需要专家作出大量努力,即使是几年来也需要专家作出大量努力,以生成解析规则,这也需要长期维护,因为高级别PL(例如,二进制代码)和低级PL(例如,C/C+++)的合成规则。此外,理想的分解法应简明地生成高级别的PL(系统),具有类似于源代码的低级别PL(系统)和语义信息(系统)的类似功能。不幸的是,现有的基于规则的分解法技术只是功能性地将低级别PL(PL)恢复规则恢复到类似的高等级,并且仍然无法恢复语义信息。在这个文件中,我们建议一种新型的神经分解方法,将低级的低级内置为准确性和高等级的内置的内存数据, 。在SEVLA-ADLODLA值上, 正确性上,这是一个更精确的高级的高级、更精确地展示一个更精确的SUDLOV。