使用静态代码分析器检测公开源码储存库的安全修补情况 (Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers) - 专知论文

会员服务 ·

0

机器学习建模 · ML · 可辨认的 · INFORMS · MoDELS ·

2021 年 5 月 7 日

Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers

翻译：使用静态代码分析器检测公开源码储存库的安全修补情况

Therese Fehrer,Rocío Cabrera Lozoya,Antonino Sabetta,Dario Di Nucci,Damian A. Tamburri

from arxiv, Submitted to ESEC/FSE 2021, Industry Track

The sources of reliable, code-level information about vulnerabilities that affect open-source software (OSS) are scarce, which hinders a broad adoption of advanced tools that provide code-level detection and assessment of vulnerable OSS dependencies. In this paper, we study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications. In particular, we investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes. We analyze such embeddings for security-relevant and non-security-relevant commits, and we show that, although in isolation they are not different in a statistically significant manner, it is possible to use them to construct a ML pipeline that achieves results comparable with the state of the art. We also found that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities: the ML models we construct and commit2vec are complementary, the former being more generally applicable, albeit not as accurate.

翻译：有关影响开放源码软件的脆弱性的可靠、代码级信息来源稀缺,妨碍了广泛采用先进的工具,提供代码级的检测和评估脆弱的开放源码软件依赖性。在本文件中,我们研究了现成静态代码分析器的产出在多大程度上可以用作机器学习应用中体现其承诺的特征的来源。我们特别调查这些特征如何用于构建嵌入并培训ML模型,以自动识别含有脆弱性修正的源码。我们分析了安全相关和非安全相关承诺的这种嵌入,我们分析这些嵌入在安全相关和非安全相关承诺方面是相辅相成的,我们表明,尽管孤立地这些嵌入在具有统计意义的情况下并不不同,但有可能使用它们来构建一个ML管道,其结果与艺术状况相仿。我们还发现,我们的方法与承诺2vec相结合,表明在自动识别确定确定确定确定脆弱性承诺方面的情况有了明显改善:我们构建和承诺2vec的ML模型是相辅相成的,前者是普遍适用的,尽管不准确。

0

相关内容

机器学习建模

机器学习建模

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

176+阅读 · 2020年5月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

已删除

将门创投

7+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing

Arxiv

0+阅读 · 2021年6月25日

Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code and issue-tracking data

Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code and issue-tracking data

Arxiv

0+阅读 · 2021年6月24日

Empirical Study of Transformers for Source Code

Arxiv

0+阅读 · 2021年6月24日

From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

Arxiv

0+阅读 · 2021年6月23日

Estimating the Robustness of Classification Models by the Structure of the Learned Feature-Space

Arxiv

0+阅读 · 2021年6月23日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Generative Adversarial Active Learning for Unsupervised Outlier Detection

Generative Adversarial Active Learning for Unsupervised Outlier Detection

Arxiv

5+阅读 · 2019年3月14日

Towards security defect prediction with AI

Arxiv

3+阅读 · 2018年9月12日

One-Class Adversarial Nets for Fraud Detection

Arxiv

3+阅读 · 2018年6月5日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

VIP会员

文章信息

相关主题

机器学习建模

相关VIP内容

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

176+阅读 · 2020年5月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

已删除

将门创投

7+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing

Arxiv

0+阅读 · 2021年6月25日

Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code and issue-tracking data

Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code and issue-tracking data

Arxiv

0+阅读 · 2021年6月24日

Empirical Study of Transformers for Source Code

Arxiv

0+阅读 · 2021年6月24日

From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

Arxiv

0+阅读 · 2021年6月23日

Estimating the Robustness of Classification Models by the Structure of the Learned Feature-Space

Arxiv

0+阅读 · 2021年6月23日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Generative Adversarial Active Learning for Unsupervised Outlier Detection

Generative Adversarial Active Learning for Unsupervised Outlier Detection

Arxiv

5+阅读 · 2019年3月14日

Towards security defect prediction with AI

Arxiv

3+阅读 · 2018年9月12日

One-Class Adversarial Nets for Fraud Detection

Arxiv

3+阅读 · 2018年6月5日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

微信扫码咨询专知VIP会员