以变换器为基础的软件脆弱性检测语言模型 (Transformer-Based Language Models for Software Vulnerability Detection) - 专知论文

会员服务 ·

0

语言模型化 · Performer · MoDELS · 门控循环单元 · Analysis ·

2022 年 9 月 6 日

Transformer-Based Language Models for Software Vulnerability Detection

翻译：以变换器为基础的软件脆弱性检测语言模型

Chandra Thapa,Seung Ick Jang,Muhammad Ejaz Ahmed,Seyit Camtepe,Josef Pieprzyk,Surya Nepal

from arxiv, 16 pages

The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the closeness of natural languages to high-level programming languages, such as C/C++, this work studies how to leverage (large) transformer-based language models in detecting software vulnerabilities and how good are these models for vulnerability detection tasks. In this regard, firstly, a systematic (cohesive) framework that details source code translation, model preparation, and inference is presented. Then, an empirical analysis is performed with software vulnerability datasets with C/C++ source codes having multiple vulnerabilities corresponding to the library function call, pointer usage, array usage, and arithmetic expression. Our empirical results demonstrate the good performance of the language models in vulnerability detection. Moreover, these language models have better performance metrics, such as F1-score, than the contemporary models, namely bidirectional long short-term memory and bidirectional gated recurrent unit. Experimenting with the language models is always challenging due to the requirement of computing resources, platforms, libraries, and dependencies. Thus, this paper also analyses the popular platforms to efficiently fine-tune these models and present recommendations while choosing the platforms.

翻译：以变压器为基础的大型变压器语言模型在自然语言处理方面表现优异。通过考虑这些模型在一个领域获得的知识能否转移到其他相关领域,以及自然语言与C/C+++等高级编程语言的接近性,本项工作研究如何利用(大)变压器语言模型发现软件脆弱性,以及这些脆弱性检测模型的好坏。首先,介绍了一个系统化(连锁)框架,详细介绍源代码翻译、模型编制和推断。然后,对C/C++源代码的软件脆弱性数据集进行了经验性分析,该软件脆弱性数据集具有与图书馆功能调用、指针使用、阵列使用和算术表达等相对应的多重脆弱性。我们的经验性结果展示了语言模型在脆弱性检测方面的良好性能。此外,这些语言模型比当代模型有更好的性能衡量标准,即双向短期存留存和双向闭锁的经常单元。与语言模型的实验总是具有挑战性,因为需要精确计算资源、平台、图书馆和依赖性平台。因此,还分析了纸质模型。

0

相关内容

语言模型化

语言模型化

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

TRIB3基因表达对糖尿病大血管致纤维病变的作用及中药桃仁干预机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

ATF诱导Th1/Th2漂移构建血管化人工胰岛的研究

国家自然科学基金

0+阅读 · 2014年12月31日

脑卒中诱导的脑组织炎症与外周免疫抑制双向效应的免疫机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

盘羊及其杂交羊SPLUNC1特性研究及抗肺炎支原体差异表达基因的筛选

国家自然科学基金

0+阅读 · 2014年12月31日

牛蒡子中Arctignan A，Lappaol C及其衍生物的合成和抗白血病活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

可溶性CD160分子阻遏宫颈癌免疫逃逸的作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Sunderland分级的兔坐骨神经损伤的弥散张量成像定量研究

国家自然科学基金

0+阅读 · 2012年12月31日

新生大鼠高氧动物模型肠黏膜损伤机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Treg细胞对Th1、Th2、Th17细胞介导的眼内炎症的调节作用

国家自然科学基金

0+阅读 · 2009年12月31日

Fine-tuned Language Models are Continual Learners

Arxiv

0+阅读 · 2022年10月19日

COLD: A Benchmark for Chinese Offensive Language Detection

Arxiv

0+阅读 · 2022年10月19日

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Arxiv

0+阅读 · 2022年10月19日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Object Detection in 20 Years: A Survey

Object Detection in 20 Years: A Survey

Arxiv

48+阅读 · 2019年5月13日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

语言模型化

门控循环单元

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Fine-tuned Language Models are Continual Learners

Arxiv

0+阅读 · 2022年10月19日

COLD: A Benchmark for Chinese Offensive Language Detection

Arxiv

0+阅读 · 2022年10月19日

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Arxiv

0+阅读 · 2022年10月19日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Object Detection in 20 Years: A Survey

Object Detection in 20 Years: A Survey

Arxiv

48+阅读 · 2019年5月13日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

TRIB3基因表达对糖尿病大血管致纤维病变的作用及中药桃仁干预机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

ATF诱导Th1/Th2漂移构建血管化人工胰岛的研究

国家自然科学基金

0+阅读 · 2014年12月31日

脑卒中诱导的脑组织炎症与外周免疫抑制双向效应的免疫机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

盘羊及其杂交羊SPLUNC1特性研究及抗肺炎支原体差异表达基因的筛选

国家自然科学基金

0+阅读 · 2014年12月31日

牛蒡子中Arctignan A，Lappaol C及其衍生物的合成和抗白血病活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

可溶性CD160分子阻遏宫颈癌免疫逃逸的作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Sunderland分级的兔坐骨神经损伤的弥散张量成像定量研究

国家自然科学基金

0+阅读 · 2012年12月31日

新生大鼠高氧动物模型肠黏膜损伤机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Treg细胞对Th1、Th2、Th17细胞介导的眼内炎症的调节作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员