在开放源码储存库中将脆弱性咨询员自动绘图到其固定文件中 (Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories)

The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of three phases. First, an advisory record containing key information about a vulnerability is extracted from an advisory (expressed in natural language). Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project by filtering out commits that are known to be irrelevant for the task at hand. Finally, for each such candidate commit, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. The feature vectors are then exploited for building a final ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to interpret of the predictions. We evaluated our approach using a prototype implementation named Prospector on a manually curated data set that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). In conclusion, our method reduces considerably the effort needed to search OSS repositories for the commits that fix known vulnerabilities.

翻译：缺乏准确的脆弱性数据的全面来源是研究和理解软件脆弱性(及其校正)的关键障碍。在本文件中,我们提出了一个方法,将实际经验和机器学习(ML)――具体而言,自然语言处理(NLP)――的超常性结合,以解决这一问题。我们的方法分为三个阶段。首先,从咨询(用自然语言表达)中提取了含有脆弱性关键信息的咨询记录。第二,使用惯性,从受影响项目源代码库中获取了一组候选人确定承诺承诺,过滤了已知与当前任务无关的承诺。最后,我们的方法将实际经验和机器学习(ML)的超自然语言处理(NLP)的超常性结合起来,以解决这一问题。首先,我们的方法将反映与预测其与当前咨询匹配的承诺相关的承诺的特性。然后,利用包含脆弱性关键信息记录(ML模式对每个特性的评分,让用户能够理解预测。我们用一个名为Prospector的原型执行方法评估了我们的方法,在手动的服务器定位中承诺了与当前任务无关的承诺。对于每个候选人来说,我们的方法将建立一个已知的脆弱性数据组合,在2,3-1031确定我们已知的固定的首级中,将承诺对确定我们已知标准。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日