智能学习寻找蠢笨合约 (Smart Learning to Find Dumb Contracts)

We introduce Deep Learning Vulnerability Analyzer (DLVA), a vulnerability detection tool for Ethereum smart contracts based on powerful deep learning techniques for sequential data adapted for bytecode. We train DLVA to judge bytecode even though the supervising oracle, Slither, can only judge source code. DLVA's training algorithm is general: we "extend" a source code analysis to bytecode without any manual feature engineering, predefined patterns, or expert rules. DLVA's training algorithm is also robust: it overcame a 1.25% error rate mislabeled contracts, and the student surpassing the teacher; found vulnerable contracts that Slither mislabeled. In addition to extending a source code analyzer to bytecode, DLVA is much faster than conventional tools for smart contract vulnerability detection based on formal methods: DLVA checks contracts for 29 vulnerabilities in 0.2 seconds, a speedup of 10-500x+ compared to traditional tools. DLVA has three key components. Smart Contract to Vector (SC2V) uses neural networks to map arbitrary smart contract bytecode to an high-dimensional floating-point vector. Sibling Detector (SD) classifies contracts when a target contract's vector is Euclidian-close to a labeled contract's vector in a training set; although only able to judge 55.7% of the contracts in our test set, it has an average accuracy of 97.4% with a false positive rate of only 0.1%. Lastly, Core Classifier (CC) uses neural networks to infer vulnerable contracts regardless of vector distance. DLVA has an overall accuracy of 96.6% with an associated false positive rate of only 3.7%.

翻译：我们介绍了基于深度学习技术的漏洞检测工具——Deep Learning Vulnerability Analyzer（DLVA）。该工具用于检测以字节码形式表示的以太坊智能合约的漏洞。我们使用强大的深度学习技术，针对字节码进行顺序数据适应，训练DLVA进行判断。尽管监督Oracle Slither只能判断源代码，但DLVA的训练算法是通用的：我们“扩展”源代码分析到字节码，无需任何手动特征工程、预定义模式或专家规则。DLVA的训练算法也是鲁棒的：它超越了一个1.25%的误差率的错误标记的合约，发现了Slither错误标记的易受攻击的合约。除了将源代码分析扩展到字节码，DLVA比基于形式方法的传统智能合约漏洞检测工具快得多：DLVA可以在0.2秒内检查29个漏洞的合约，速度提高了10-500倍。DLVA有三个关键组件:智能合约向量器(SC2V)使用神经网络将任意智能合约字节码映射到高维浮点向量。相邻合约检测器(SD)将一目标合约的向量欧几里得地靠近培训集中的一个标记合约的向量时分类合约;虽然只能判断测试集中的55.7%的合约，但平均准确度为97.4%，误报率仅为0.1%。最后，核心分类器(CC)使用神经网络推断易受攻击的合约，无论向量距离如何。DLVA的整体准确率为96.6%，误报率仅为3.7%。