Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize their fixing. Using large-scale data from 1,782 functions of 429 SVs in 200 real-world projects, we investigate ML models for automating function-level SV assessment tasks, i.e., predicting seven Common Vulnerability Scoring System (CVSS) metrics. We particularly study the value and use of vulnerable statements as inputs for developing the assessment models because SVs in functions are originated in these statements. We show that vulnerable statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger assessment performance (Matthews Correlation Coefficient (MCC)) than non-vulnerable statements. Incorporating context of vulnerable statements further increases the performance by up to 8.9% (0.64 MCC and 0.75 F1-Score). Overall, we provide the initial yet promising ML-based baselines for function-level SV assessment, paving the way for further research in this direction.
翻译:许多研究都开发了机器学习(ML)方法,以探测导致SV的功能和细微编码说明中的软件脆弱性。然而,在利用这种检测产出进行数据驱动的SV评估以提供关于SV的可利用性、影响和严重程度的信息方面,没有做多少工作。这些信息对于理解SV和确定其确定优先次序十分重要。利用200个现实世界项目中429个SV的1,782功能的大规模数据,我们调查了功能级SV评估任务自动化的ML模型,即预测了7个共同脆弱性分解系统(CVSS)指标。我们特别研究脆弱声明的价值和使用作为开发评估模型的投入,因为功能中的SV是这些声明的起源。我们表明,脆弱声明的大小是5.8倍,但比基于无弹性的报表(Matthews Correlegal covaly (MCC)) 高出7.5%-114.5%。纳入脆弱声明的背景进一步将业绩提高到8.9% (0.64 MCM和0.75 F-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-A-S-S-S-A)-A)-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A)-A-A-S-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A