Blockchain is an emerging technology for its decentralization and the capability of enabling cryptocurrencies and smart contracts. However, as a distributed ledger software by nature, blockchain inevitably has software issues. While application-level smart contracts have been extensively investigated, the underlying system-level security bugs of blockchain are much less explored. In this paper, we conduct an empirical study of blockchain's system vulnerabilities using four representative blockchains, Bitcoin, Ethereum, Monero, and Stellar. Due to the lack of CVE information associated with these blockchain projects, we first design a systematic process to effectively identify 1,037 vulnerabilities and their 2,317 patches from 34,245 issues/PRs (pull requests) and 85,164 commits on GitHub. Atop this unique dataset, we perform three levels of analyses, including (i) file-level vulnerable module categorization by identifying and correlating module paths across projects, (ii) text-level vulnerability type clustering by combining natural language processing with similarity-based sentence clustering, and (iii) code-level vulnerability pattern analysis by generating and clustering the code change signatures that concisely capture both syntactic and semantic information of patch code fragments. Among detailed results, our analysis reveals three key findings, including (i) some blockchain modules are more susceptible than the others; notably, the modules related to consensus, wallet, and networking are highly susceptible, each with over 200 issues; (ii) around 70% of blockchain vulnerabilities are in traditional types, but we also identify four new types specific to blockchains; and (iii) we obtain 21 blockchain-specific vulnerability patterns and demonstrate that they can be used to detect similar vulnerabilities in other top blockchains (e.g., Dogecoin and Bitcoin SV).
翻译:在本文中,我们利用四个具有代表性的链条、Bitcoin、Etheum、Monero和Stellar等模块,对块链系统的脆弱性进行实证研究。由于缺少与这些链链条项目相关的CVE信息,我们首先设计一个系统化程序,以有效识别1 037个弱点及其2 317个补丁,其中涉及34 245个问题/PR(要求)和85 164个问题。虽然对应用级智能合同进行了广泛调查,但对于系统级的封链安全缺陷的探索却少得多。在本文中,我们用四个具有代表性的链条链条链条链条、Bitcoin、Etheum、Monor、Monero和Stellar。由于缺少与这些链条链条相关的CEVE,我们首先设计一个系统化程序,以有效识别1 037个弱点及其2 317个补丁点,其中含有34 245个问题/PR(要求)和85,164个补丁(要求)对GitHHUbblock-S) 数据集式进行承诺。我们进行三个关键的分类分析,其中的系统化分析。(例如和Slevilevildal 和Sal 等的系统分析,明显地) 和结构分析。(我们之间,以不同的方法和系统型)以显示的每块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块块为不同) 。