通过神经机器翻译解压缩 (Semantics-Recovering Decompilation through Neural Machine Translation)

Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as vulnerability search and malware analysis. However, current decompilation tools usually need lots of experts' efforts, even for years, to generate the rules for decompilation, which also requires long-term maintenance as the syntax of high-level PL or low-level PL changes. Also, an ideal decompiler should concisely generate high-level PL with similar functionality to the source low-level PL and semantic information (e.g., meaningful variable names), just like human-written code. Unfortunately, existing manually-defined rule-based decompilation techniques only functionally restore the low-level PL to a similar high-level PL and are still powerless to recover semantic information. In this paper, we propose a novel neural decompilation approach to translate low-level PL into accurate and user-friendly high-level PL, effectively improving its readability and understandability. Furthermore, we implement the proposed approaches called SEAM. Evaluations on four real-world applications show that SEAM has an average accuracy of 94.41%, which is much better than prior neural machine translation (NMT) models. Finally, we evaluate the effectiveness of semantic information recovery through a questionnaire survey, and the average accuracy is 92.64%, which is comparable or superior to the state-of-the-art compilers.

翻译：分解法将低级别程序语言(PL)(例如,二进制代码)转换为高级PL(例如,C/C+++) 。当分析家对源代码不存在的软件(系统)进行安全分析(系统)时,它被广泛使用,例如脆弱性搜索和恶意软件分析。然而,目前的分解法工具通常需要专家作出大量努力,即使是几年来也需要专家作出大量努力,以生成解析规则,这也需要长期维护,因为高级别PL(例如,二进制代码)和低级PL(例如,C/C+++)的合成规则。此外,理想的分解法应简明地生成高级别的PL(系统),具有类似于源代码的低级别PL(系统)和语义信息(系统)的类似功能。不幸的是,现有的基于规则的分解法技术只是功能性地将低级别PL(PL)恢复规则恢复到类似的高等级,并且仍然无法恢复语义信息。在这个文件中,我们建议一种新型的神经分解方法,将低级的低级内置为准确性和高等级的内置的内存数据, 。在SEVLA-ADLODLA值上, 正确性上,这是一个更精确的高级的高级、更精确地展示一个更精确的SUDLOV。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日