Blockchain, as a distributed ledger technology, becomes increasingly popular, especially for enabling valuable cryptocurrencies and smart contracts. However, the blockchain software systems inevitably have many bugs. Although bugs in smart contracts have been extensively investigated, security bugs of the underlying blockchain systems are much less explored. In this paper, we conduct an empirical study on blockchain's system vulnerabilities from four representative blockchains, Bitcoin, Ethereum, Monero, and Stellar. Specifically, we first design a systematic filtering process to effectively identify 1,037 vulnerabilities and their 2,317 patches from 34,245 issues/PRs (pull requests) and 85,164 commits on GitHub. We thus build the first blockchain vulnerability dataset. We then perform unique analyses of this dataset at three levels, including (i) file-level vulnerable module categorization by identifying and correlating module paths across projects, (ii) text-level vulnerability type clustering by natural language processing and similarity-based sentence clustering, and (iii) code-level vulnerability pattern analysis by generating and clustering code change signatures that capture both syntactic and semantic information of patch code fragments. Our analyses reveal three key findings: (i) some blockchain modules are more susceptible than the others; notably, each of the modules related to consensus, wallet, and networking has over 200 issues; (ii) about 70% of blockchain vulnerabilities are of traditional types, but we also identify four new types specific to blockchains; and (iii) we obtain 21 blockchain-specific vulnerability patterns that capture unique blockchain attributes and statuses, and demonstrate that they can be used to detect similar vulnerabilities in other popular blockchains, such as Dogecoin, Bitcoin SV, and Zcash.
翻译:作为分布式分类账技术,链链条越来越受欢迎,特别是用于提供宝贵的加密和智能合同。然而,链链式软件系统不可避免地有许多错误。虽然对智能合同中的错误进行了广泛调查,但基本链条系统的安全错误远没有那么深入探讨。在本文中,我们从四个具有代表性的链条、Bitcoin、Etheum、Monero和Stellar等四个具有代表性的链条、Bitcoin、Etheum、Monero和Stella等对链式系统的脆弱性进行了实证研究。具体地说,我们首先设计一个系统过滤程序,以有效识别1 037个链式的弱点及其2 317个补丁,分别来自34 245 问题/PR(要求)和85 164 基特Hub 。我们因此建立了第一个链式脆弱合同中的错误。我们随后在三个层次上对这一数据集进行了独特的分析,包括:(一) 通过识别和关联的模块, (二) 通过自然语言处理和类似的句式组合组合, 文本中的文本类型易变的弱点模式分析,以及(三) 通过生成和组合码级的代码改变信号,我们掌握了特定的链式的系统模型中的某些版本, 系统模型分析, 等关键版本的版本的系统状态分析。