改进由接受监督的机器学习的编译者所推断的种类信息 (Improving type information inferred by decompilers with supervised machine learning) - 专知论文

会员服务 ·

0

binary · INFORMS · Machine Learning · 推断 · 学成 ·

2021 年 1 月 19 日

Improving type information inferred by decompilers with supervised machine learning

翻译：改进由接受监督的机器学习的编译者所推断的种类信息

Javier Escalada,Ted Scully,Francisco Ortin

In software reverse engineering, decompilation is the process of recovering source code from binary files. Decompilers are used when it is necessary to understand or analyze software for which the source code is not available. Although existing decompilers commonly obtain source code with the same behavior as the binaries, that source code is usually hard to interpret and certainly differs from the original code written by the programmer. Massive codebases could be used to build supervised machine learning models aimed at improving existing decompilers. In this article, we build different classification models capable of inferring the high-level type returned by functions, with significantly higher accuracy than existing decompilers. We automatically instrument C source code to allow the association of binary patterns with their corresponding high-level constructs. A dataset is created with a collection of real open-source applications plus a huge number of synthetic programs. Our system is able to predict function return types with a 79.1% F1-measure, whereas the best decompiler obtains a 30% F1-measure. Moreover, we document the binary patterns used by our classifier to allow their addition in the implementation of existing decompilers.

翻译：在软件反向工程中,解压缩是从二进制文档中回收源代码的过程。当需要理解或分析源代码所缺的软件时, 使用解压缩器。虽然现有的解压缩器通常获得源代码, 且其行为与二进制代码相同, 但源代码通常很难解释, 并且肯定与程序员的原始代码不同。大型代码库可以用来构建监管的机器学习模型, 目的是改进现有的解压缩器。在本条中, 我们构建了不同的分类模型, 能够根据功能推断出高层次类型, 精确度远高于现有的解压缩器。我们自动使用 C 源代码, 允许将二进制模式与其相应的高层次构造联系起来。创建数据集时, 收集了真实的开源应用程序, 外加了大量合成程序。我们的系统可以用79.1% F1- 度来预测函数返回类型, 而最佳解压缩器则获得 30% F1- 度。此外, 我们记录了我们的分类器所使用的二进式模式, 以便在实施现有的 deilcompors 中添加它们。

0

相关内容

binary

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日

【2020新书】图机器学习，Graph-Powered Machine Learning

【2020新书】图机器学习，Graph-Powered Machine Learning

专知会员服务

343+阅读 · 2020年1月27日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Membership Inference Attacks on Machine Learning: A Survey

Arxiv

0+阅读 · 2021年3月14日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

The Deep Learning Compiler: A Comprehensive Survey

Arxiv

15+阅读 · 2020年2月6日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Arxiv

3+阅读 · 2018年11月15日

Building medical image classifiers with very limited data using segmentation networks

Arxiv

4+阅读 · 2018年8月15日

Deep Learning

Arxiv

6+阅读 · 2018年8月3日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日

【2020新书】图机器学习，Graph-Powered Machine Learning

【2020新书】图机器学习，Graph-Powered Machine Learning

专知会员服务

343+阅读 · 2020年1月27日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Membership Inference Attacks on Machine Learning: A Survey

Arxiv

0+阅读 · 2021年3月14日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

The Deep Learning Compiler: A Comprehensive Survey

Arxiv

15+阅读 · 2020年2月6日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Arxiv

3+阅读 · 2018年11月15日

Building medical image classifiers with very limited data using segmentation networks

Arxiv

4+阅读 · 2018年8月15日

Deep Learning

Arxiv

6+阅读 · 2018年8月3日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员