在键盘上睡觉? 评估GitHub副驾驶的代码贡献的安全性 (Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions) - 专知论文

会员服务 ·

0

Performer · 语言模型化 · GitHub · 多样性 · Notability ·

2021 年 12 月 16 日

Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions

翻译：在键盘上睡觉? 评估GitHub副驾驶的代码贡献的安全性

Hammond Pearce,Baleegh Ahmad,Benjamin Tan,Brendan Dolan-Gavitt,Ramesh Karri

from arxiv, Accepted for publication in IEEE Symposium on Security and Privacy 2022

There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer', GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE's "Top 25" list). We explore Copilot's performance on three distinct code generation axes -- examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable.

翻译：设计基于AI的系统,以协助人类设计计算机系统,包括自动生成计算机代码的工具。其中最显著的是第一个自我描述的“对对配程序程序员”GitHub Copilitor,这是一个经过开放源代码GitHub代码培训的语言模型。然而,代码中往往含有错误,而鉴于共同试点处理的大量未经过审读的代码,因此,鉴于该语言模型肯定会从可开发的、错误的代码中学习到语言模型。这引起了对共同试点项目代码贡献安全性的关切。在这项工作中,我们系统地调查可能导致GitHub共同试点项目推荐不安全代码的流行程度和条件。为了进行这一分析,我们促使共同试点在高风险 CWES(例如麻省麻省麻省理工学院“Top 25” 列表)相关情景中生成代码。我们探索了三个不同的代码生成轴的性能 -- 检查它如何在薄弱环节、提示多样性和领域多样性方面表现。我们共设计了89种脆弱情景,我们发现这些情景,大约是1,689个方案。

0

相关内容

Performer

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

《代码整洁之道》：5大基本要点

《代码整洁之道》：5大基本要点

专知会员服务

50+阅读 · 2020年3月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

已删除

将门创投

8+阅读 · 2017年7月21日

Trusted AI in Multi-agent Systems: An Overview of Privacy and Security for Distributed Learning

Arxiv

0+阅读 · 2022年2月18日

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Arxiv

0+阅读 · 2022年2月17日

How Do Smart Contracts Benefit Security Protocols?

How Do Smart Contracts Benefit Security Protocols?

Arxiv

0+阅读 · 2022年2月17日

The Development and Prospect of Code Clone

Arxiv

0+阅读 · 2022年2月17日

Bayesian Nonparametrics for Offline Skill Discovery

Arxiv

0+阅读 · 2022年2月16日

Blockchain Security when Messages are Lost

Arxiv

0+阅读 · 2022年2月16日

Attracting and Retaining OSS Contributors with a Maintainer Dashboard

Arxiv

0+阅读 · 2022年2月15日

Your "Labrador" is My "Dog": Fine-Grained, or Not

Arxiv

27+阅读 · 2021年2月17日

Conversational Machine Comprehension: a Literature Review

Arxiv

3+阅读 · 2020年6月1日

Topic Modelling of Everyday Sexism Project Entries

Arxiv

3+阅读 · 2018年4月5日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

《代码整洁之道》：5大基本要点

《代码整洁之道》：5大基本要点

专知会员服务

50+阅读 · 2020年3月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

AI Agent、传统聊天机器人有何区别？如何评测？这篇30页综述讲明白了

【普林斯顿博士论文】迈向原则化的强化学习

基于多模态大模型的具身智能体研究进展与展望

CVPR2025 | ODE：多模态大语言模型幻觉的开集动态评估框架

相关资讯

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

已删除

将门创投

8+阅读 · 2017年7月21日

相关论文

Trusted AI in Multi-agent Systems: An Overview of Privacy and Security for Distributed Learning

Arxiv

0+阅读 · 2022年2月18日

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Arxiv

0+阅读 · 2022年2月17日

How Do Smart Contracts Benefit Security Protocols?

How Do Smart Contracts Benefit Security Protocols?

Arxiv

0+阅读 · 2022年2月17日

The Development and Prospect of Code Clone

Arxiv

0+阅读 · 2022年2月17日

Bayesian Nonparametrics for Offline Skill Discovery

Arxiv

0+阅读 · 2022年2月16日

Blockchain Security when Messages are Lost

Arxiv

0+阅读 · 2022年2月16日

Attracting and Retaining OSS Contributors with a Maintainer Dashboard

Arxiv

0+阅读 · 2022年2月15日

Your "Labrador" is My "Dog": Fine-Grained, or Not

Arxiv

27+阅读 · 2021年2月17日

Conversational Machine Comprehension: a Literature Review

Arxiv

3+阅读 · 2020年6月1日

Topic Modelling of Everyday Sexism Project Entries

Arxiv

3+阅读 · 2018年4月5日

微信扫码咨询专知VIP会员