Assessing the exploitability of software vulnerabilities at the time of disclosure is difficult and error-prone, as features extracted via technical analysis by existing metrics are poor predictors for exploit development. Moreover, exploitability assessments suffer from a class bias because "not exploitable" labels could be inaccurate. To overcome these challenges, we propose a new metric, called Expected Exploitability (EE), which reflects, over time, the likelihood that functional exploits will be developed. Key to our solution is a time-varying view of exploitability, a departure from existing metrics, which allows us to learn EE using data-driven techniques from artifacts published after disclosure, such as technical write-ups, proof-of-concept exploits, and social media discussions. Our analysis reveals that prior features proposed for related exploit prediction tasks are not always beneficial for predicting functional exploits, and we design novel feature sets to capitalize on previously under-utilized artifacts. This view also allows us to investigate the effect of the label biases on the classifiers. We characterize the noise-generating process for exploit prediction, showing that our problem is subject to class- and feature-dependent label noise, considered the most challenging type. By leveraging domain-specific observations, we then develop techniques to incorporate noise robustness into learning EE. On a dataset of 103,137 vulnerabilities, we show that EE increases precision from 49\% to 86\% over existing metrics, including two state-of-the-art exploit classifiers, while the performance of our metric also improving over time. EE scores capture exploitation imminence, by distinguishing exploits which are going to be developed in the near future.
翻译:评估披露时软件脆弱性的可开发性是困难和容易出错的,因为现有指标通过技术分析得出的特征对开发开发的预测不力。此外,可开发性评估也存在阶级偏差,因为“不可开发”标签可能不准确。为了克服这些挑战,我们提议了一个名为“预期可开发性(EEE)”的新指标,它反映了在一段时间内开发功能开发的可能性。我们解决方案的关键在于对可开发性有时间差异的看法,不同于现有的指标,它使我们能够从披露后公布的艺术品中学习由数据驱动的技术,例如技术写作、验证概念开发以及社交媒体讨论等。我们的分析表明,先前为相关开发预测任务而提出的特征并不总是有利于预测功能开发的可开发性利用性,我们设计新的特征组将利用以往未充分利用的艺术品。 这一观点还使我们能够调查标签偏差对分类者的影响。 我们通过利用预测的噪音生成过程,显示我们的问题在接近披露后所公布的由类别和功能驱动性的工艺,例如技术的准确性评分级、测试对精度的利用度的测试,同时将E-103的精确度纳入我们认为最具有挑战性的数据类型。