LLMSecEval：用于安全评估的自然语言提示数据集 (LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations) - 专知论文

会员服务 ·

0

Prompt · Performer · 代码 · 数据集 · MoDELS ·

2023 年 3 月 16 日

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

翻译：LLMSecEval：用于安全评估的自然语言提示数据集

Catherine Tony,Markus Mutas,Nicolás E. Díaz Ferreyra,Riccardo Scandariato

from arxiv, Accepted at MSR '23 Data and Tool Showcase Track

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present LLMSecEval, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE's Top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.

翻译：大型语言模型（LLMs）如Codex是执行代码补全和代码生成任务的强大工具，因为它们训练于来自公开可用资源的数十亿行代码。此外，这些模型可以从自然语言（NL）描述生成代码片段，通过学习公共GitHub存储库中的语言和编程惯例。虽然LLMs承诺轻松的基于NL的软件应用部署，但却很少有关于它们生成的代码安全性的广泛调查和文档。在这项工作中，我们提供了LLMSecEval数据集，其中包含150个NL提示，可用于评估此类模型的安全性能。这些提示是列表在MITRE的Top 25常见弱点枚举（CWE）排名中的容易受到各种安全漏洞的代码片段的NL描述。我们数据集中的每个提示都配有一个安全实现示例，以便与LLMs生成的代码进行比较评估。作为一个实际应用，我们展示了LLMSecEval如何被用于评估从NL描述自动生成的代码片段的安全性。

0

相关内容

Prompt

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

软件安全性分析的关键技术与工具

国家自然科学基金

0+阅读 · 2014年12月31日

纳米材料性质定量分析中的反问题

国家自然科学基金

1+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

综合InSAR与GPS江苏沿海湿地储水量变化监测研究

国家自然科学基金

0+阅读 · 2012年12月31日

典型内陆盆地地下水系统演化及其生态响应研究

国家自然科学基金

0+阅读 · 2012年12月31日

影响HbH-CS病表型多样性的甲基化基因位点的研究

国家自然科学基金

0+阅读 · 2012年12月31日

低氧对阿尔茨海默病发病影响及表观遗传学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

动力锂离子电池正极材料Li1-xMyVOPO4/C的制备及性能

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于QAR记录数据和飞行员测试数据的民航飞行员飞行综合素质与飞行操作特征的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation

Arxiv

0+阅读 · 2023年5月7日

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Arxiv

1+阅读 · 2023年5月7日

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

Arxiv

0+阅读 · 2023年5月7日

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Arxiv

0+阅读 · 2023年5月6日

Visualization in the Era of Artificial Intelligence: Experiments for Creating Structural Visualizations by Prompting Large Language Models

Arxiv

0+阅读 · 2023年5月5日

Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Arxiv

0+阅读 · 2023年5月4日

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Arxiv

0+阅读 · 2023年5月4日

Language, Time Preferences, and Consumer Behavior: Evidence from Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

VIP会员

文章信息

相关主题

相关VIP内容

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation

Arxiv

0+阅读 · 2023年5月7日

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Arxiv

1+阅读 · 2023年5月7日

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

Arxiv

0+阅读 · 2023年5月7日

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Arxiv

0+阅读 · 2023年5月6日

Visualization in the Era of Artificial Intelligence: Experiments for Creating Structural Visualizations by Prompting Large Language Models

Arxiv

0+阅读 · 2023年5月5日

Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Arxiv

0+阅读 · 2023年5月4日

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Arxiv

0+阅读 · 2023年5月4日

Language, Time Preferences, and Consumer Behavior: Evidence from Large Language Models

Arxiv

0+阅读 · 2023年5月4日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

相关基金

软件安全性分析的关键技术与工具

国家自然科学基金

0+阅读 · 2014年12月31日

纳米材料性质定量分析中的反问题

国家自然科学基金

1+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

综合InSAR与GPS江苏沿海湿地储水量变化监测研究

国家自然科学基金

0+阅读 · 2012年12月31日

典型内陆盆地地下水系统演化及其生态响应研究

国家自然科学基金

0+阅读 · 2012年12月31日

影响HbH-CS病表型多样性的甲基化基因位点的研究

国家自然科学基金

0+阅读 · 2012年12月31日

低氧对阿尔茨海默病发病影响及表观遗传学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

动力锂离子电池正极材料Li1-xMyVOPO4/C的制备及性能

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于QAR记录数据和飞行员测试数据的民航飞行员飞行综合素质与飞行操作特征的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员