With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using a set of rules. These rules take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our proposed rules gives up to 36% relative improvement over Codex, showing the quality of the rules. Further, we show that when we train a model to select the best rule, we can achieve significant performance gains over Codex. The code for our work can be found at: https://github.com/shrivastavadisha/repo_level_prompt_generation.
翻译:随着大量代码语言模型(LLMs)的成功及其作为代码助理的使用(例如,GitHub Copilot公司使用的代码模型),在快速设计过程中引入特定领域知识的技术变得非常重要。在这项工作中,我们提议了一个名为Repo-level快速生成器的框架,它学习使用一套规则生成具体实例的提示。这些规则来自整个存储器的背景,从而将存储器的结构和其他相关文件(例如,进口、母类文件)的背景都纳入其中。我们的技术并不要求获得LLM的权重,因此在只有黑盒访问LLM的情况下适用。我们利用从Google代码档案中提取的代码存储器进行单线代码自动完成任务实验。我们证明,从我们拟议规则中建起的一个信箱比代码增加了36%的相对改进,显示了规则的质量。此外,我们表明,当我们培训一个选择最佳规则的模式时,我们就能在代码上取得显著的业绩收益。我们的工作代码可以在以下的级别找到: https://gifres_presthabrebus.com。