大语言代码模式储存库一级快速生成 (Repository-Level Prompt Generation for Large Language Models of Code) - 专知论文

会员服务 ·

0

Prompt · 代码 · 语言模型化 · MoDELS · Google Code ·

2022 年 6 月 26 日

Repository-Level Prompt Generation for Large Language Models of Code

翻译：大语言代码模式储存库一级快速生成

Disha Shrivastava,Hugo Larochelle,Daniel Tarlow

With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using a set of rules. These rules take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our proposed rules gives up to 36% relative improvement over Codex, showing the quality of the rules. Further, we show that when we train a model to select the best rule, we can achieve significant performance gains over Codex. The code for our work can be found at: https://github.com/shrivastavadisha/repo_level_prompt_generation.

翻译：随着大量代码语言模型(LLMs)的成功及其作为代码助理的使用(例如,GitHub Copilot公司使用的代码模型),在快速设计过程中引入特定领域知识的技术变得非常重要。在这项工作中,我们提议了一个名为Repo-level快速生成器的框架,它学习使用一套规则生成具体实例的提示。这些规则来自整个存储器的背景,从而将存储器的结构和其他相关文件(例如,进口、母类文件)的背景都纳入其中。我们的技术并不要求获得LLM的权重,因此在只有黑盒访问LLM的情况下适用。我们利用从Google代码档案中提取的代码存储器进行单线代码自动完成任务实验。我们证明,从我们拟议规则中建起的一个信箱比代码增加了36%的相对改进,显示了规则的质量。此外,我们表明,当我们培训一个选择最佳规则的模式时,我们就能在代码上取得显著的业绩收益。我们的工作代码可以在以下的级别找到: https://gifres_presthabrebus.com。

0

相关内容

Prompt

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

结核杆菌N-乙酰谷氨酸合成酶argA的结构与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

栽培大豆茎杆直立驯化性状的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

TREM2基因的表达调控研究及靶向药物筛选

国家自然科学基金

0+阅读 · 2013年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

高压下122型铁基超导体的结构与物性研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

野生大豆耐盐碱关键基因克隆与功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

单晶纳米线中的超导近邻效应

国家自然科学基金

0+阅读 · 2011年12月31日

长链非编码RNA在淋巴细胞癌变过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

转录因子对丹酚酸类化合物生物合成调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

Arxiv

0+阅读 · 2022年8月16日

Examining Zero-Shot Vulnerability Repair with Large Language Models

Examining Zero-Shot Vulnerability Repair with Large Language Models

Arxiv

0+阅读 · 2022年8月15日

Targeted Honeyword Generation with Language Models

Arxiv

0+阅读 · 2022年8月15日

Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data

Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data

Arxiv

0+阅读 · 2022年8月12日

Learning to Prompt for Vision-Language Models

Arxiv

0+阅读 · 2022年8月12日

Generative Models: An Interdisciplinary Perspective

Arxiv

0+阅读 · 2022年8月11日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域空战指挥体系：驾驭复杂性的艺术》

构建军事人工智能信任体系始于破除黑盒机制

《生态建模密码破译：建模与编程实践》美陆军最新报告

《战争形态演变：合成兵种防御主导模式探析》48页slides

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

Arxiv

0+阅读 · 2022年8月16日

Examining Zero-Shot Vulnerability Repair with Large Language Models

Examining Zero-Shot Vulnerability Repair with Large Language Models

Arxiv

0+阅读 · 2022年8月15日

Targeted Honeyword Generation with Language Models

Arxiv

0+阅读 · 2022年8月15日

Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data

Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data

Arxiv

0+阅读 · 2022年8月12日

Learning to Prompt for Vision-Language Models

Arxiv

0+阅读 · 2022年8月12日

Generative Models: An Interdisciplinary Perspective

Arxiv

0+阅读 · 2022年8月11日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

相关基金

结核杆菌N-乙酰谷氨酸合成酶argA的结构与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

栽培大豆茎杆直立驯化性状的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

TREM2基因的表达调控研究及靶向药物筛选

国家自然科学基金

0+阅读 · 2013年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

高压下122型铁基超导体的结构与物性研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

野生大豆耐盐碱关键基因克隆与功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

单晶纳米线中的超导近邻效应

国家自然科学基金

0+阅读 · 2011年12月31日

长链非编码RNA在淋巴细胞癌变过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

转录因子对丹酚酸类化合物生物合成调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员