通过大语言模型模糊深学习图书馆</s> (Fuzzing Deep-Learning Libraries via Large Language Models) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 代码 · 约束 · Learning ·

2023 年 3 月 6 日

Fuzzing Deep-Learning Libraries via Large Language Models

翻译：通过大语言模型模糊深学习图书馆

Yinlin Deng,Chunqiu Steven Xia,Haoran Peng,Chenyuan Yang,Lingming Zhang

from arxiv, Accepted at ISSTA 2023

Detecting bugs in Deep Learning (DL) libraries (e.g., TensorFlow/PyTorch) is critical for almost all downstream DL systems in ensuring effectiveness/safety for end users. Meanwhile, traditional fuzzing techniques can be hardly effective for such a challenging domain since the input DL programs need to satisfy both the input language (e.g., Python) syntax/semantics and the DL API input/shape constraints for tensor computations. To address these limitations, we propose TitanFuzz - the first approach to directly leveraging Large Language Models (LLMs) to generate input programs for fuzzing DL libraries. LLMs are titanic models trained on billions of code snippets and can auto-regressively generate human-like code snippets. Our key insight is that modern LLMs can also include numerous code snippets invoking DL library APIs in their training corpora, and thus can implicitly learn both language syntax/semantics and intricate DL API constraints for valid DL program generation. More specifically, we use both generative and infilling LLMs (e.g., Codex/InCoder) to generate and mutate valid/diverse input DL programs for fuzzing. Our experimental results demonstrate that TitanFuzz can achieve 30.38%/50.84% higher code coverage than state-of-the-art fuzzers on TensorFlow/PyTorch. Furthermore, TitanFuzz is able to detect 65 bugs, with 41 already confirmed as previously unknown bugs. This paper demonstrates that modern titanic LLMs can be leveraged to directly perform both generation-based and mutation-based fuzzing studied for decades, while being fully automated, generalizable, and applicable to domains challenging for traditional approaches (such as DL systems). We hope TitanFuzz can stimulate more work in this promising direction of LLMs for fuzzing.

翻译：深学习( DL) 库( 例如, TensorFlow/ PyTorrch) 的检测错误对于几乎所有下游 DL 系统确保终端用户的效能/安全至关重要。同时, 传统的模糊技术对于如此具有挑战性的域来说几乎难以有效, 因为输入 DL 程序需要既满足输入语言( 例如 Python) 的语法/ 语法和 DL ASI 输入/ shape 限制来计算。为解决这些限制, 我们建议 TitanFuzz - 直接利用大语言模型( LLMS) 来生成输入程序, 用于为 DL 库创建 florm 。更具体地说, 输入 IMLMS 是要在数据库中直接使用许多代码片断, 引用 DL 库 API 的语法, 并且可以隐含地将语言的语法化/ URPI 的语法和复杂的 DLPI 限制用于有效的 DLDL 程序生成。 IMD 。更具体地说, 我们既要同时使用磁性地使用磁体,,, 也用LDLDMs 。</s>

0

相关内容

语言模型化

语言模型化

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

MoS2/ZnS核壳纳米微腔结构制备及其表面等离激元光发射和光致水解析氢研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子点红外探测器材料及器件物理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

多铁性复合薄膜的制备、磁电调控与原型器件

国家自然科学基金

0+阅读 · 2013年12月31日

II/VI族半导体纳米线异质结构的生长机理、载流子分布与输运特性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于两相界面势垒控制的高导电性贵金属/LaNiO3复合薄膜的制备与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向地基云自动化观测的天空图像理解

国家自然科学基金

0+阅读 · 2012年12月31日

基于石墨烯纳米带自旋电子器件的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

Sn基无铅焊料电迁移的各向异性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

TABLET: Learning From Instructions For Tabular Data

Arxiv

0+阅读 · 2023年4月25日

Generative Relevance Feedback with Large Language Models

Arxiv

0+阅读 · 2023年4月25日

A Survey of Large Language Models

Arxiv

4+阅读 · 2023年4月25日

Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning

Arxiv

0+阅读 · 2023年4月25日

Blockchain Large Language Models

Arxiv

0+阅读 · 2023年4月25日

Learning to Program with Natural Language

Learning to Program with Natural Language

Arxiv

0+阅读 · 2023年4月23日

An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation

Arxiv

0+阅读 · 2023年4月22日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

TABLET: Learning From Instructions For Tabular Data

Arxiv

0+阅读 · 2023年4月25日

Generative Relevance Feedback with Large Language Models

Arxiv

0+阅读 · 2023年4月25日

A Survey of Large Language Models

Arxiv

4+阅读 · 2023年4月25日

Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning

Arxiv

0+阅读 · 2023年4月25日

Blockchain Large Language Models

Arxiv

0+阅读 · 2023年4月25日

Learning to Program with Natural Language

Learning to Program with Natural Language

Arxiv

0+阅读 · 2023年4月23日

An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation

Arxiv

0+阅读 · 2023年4月22日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

相关基金

MoS2/ZnS核壳纳米微腔结构制备及其表面等离激元光发射和光致水解析氢研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子点红外探测器材料及器件物理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

多铁性复合薄膜的制备、磁电调控与原型器件

国家自然科学基金

0+阅读 · 2013年12月31日

II/VI族半导体纳米线异质结构的生长机理、载流子分布与输运特性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于两相界面势垒控制的高导电性贵金属/LaNiO3复合薄膜的制备与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向地基云自动化观测的天空图像理解

国家自然科学基金

0+阅读 · 2012年12月31日

基于石墨烯纳米带自旋电子器件的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

Sn基无铅焊料电迁移的各向异性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员