Due to the risks associated with vulnerabilities in smart contracts, their security has gained significant attention in recent years. However, there is a lack of open datasets on smart contract vulnerabilities and their fixes that allows for data-driven research. Towards this end, we propose an automated method for mining and classifying Ethereum's smart contract vulnerabilities and their corresponding fixes from GitHub and from the Common Vulnerabilities and Exposures (CVE) records in the National Vulnerability Database. We implemented the proposed method in a fully automated framework, which we call AutoMESC. AutoMESC uses seven of the most well-known smart contract security tools to classify and label the collected vulnerabilities based on vulnerability types. Furthermore, it collects metadata that can be used in data-intensive smart contract security research (e.g., vulnerability detection, vulnerability classification, severity prediction, and automated repair). We used AutoMESC to construct a sample dataset and made it publicly available. Currently, the dataset contains 6.7K smart contracts' vulnerability-fix pairs written in Solidity. We assess the quality of the constructed dataset in terms of accuracy, provenance, and relevance, and compare it with existing datasets. AutoMESC is designed to collect data continuously and keep the corresponding dataset up-to-date with newly discovered smart contract vulnerabilities and their fixes from GitHub and CVE records.
翻译:近年来,由于智能合同中的脆弱性带来的风险,他们的安全在智能合同中受到高度重视,然而,缺乏关于智能合同脆弱性的开放数据集,缺乏能够进行数据驱动研究的智能合同脆弱性及其固定方法。为此,我们建议采用自动方法,对Etheem的智能合同脆弱性进行采矿和分类,并对GitHub和通用脆弱性和暴露数据库(CVE)中的相应定义进行分类。我们在一个完全自动化的框架内实施了拟议方法,我们称之为AutoMESC。AutoMESC使用7个最著名的智能合同安全工具,根据脆弱性类型对所收集的脆弱性进行分类和标签。此外,它收集了可用于数据密集型智能合同安全研究(例如脆弱性检测、脆弱性分类、严重程度预测和自动修理)的元数据。我们利用AutoMESC建立样本数据集,并向公众提供这些数据。目前,数据集包含6.7K智能合同脆弱性-fix对立的固态。我们评估了所建数据集的质量,准确性、真伪性、相关性和相关性,并用最新数据与最新数据进行比较。