With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines. The code for our work can be found at: \url{https://github.com/shrivastavadisha/repo_level_prompt_generation}.
翻译:随着大量代码语言模型(LLMs)的成功及其作为代码助理的使用(例如,GitHub Copilot公司使用的代码),在快速设计过程中引入特定领域知识的技术变得非常重要。在这项工作中,我们提议了一个名为Repo-level 快速生成器的框架,它学会利用迅速的建议生成具体实例的提示。迅速的建议来自整个存储库的背景,从而将存储器的结构和其他相关文件(例如,进口、母类文件)的背景都纳入其中。我们的技术不需要获得LLM的重量,因此,在只有黑盒使用LLM的情况下,就可适用。我们利用从Google代码档案中提取的代码存储器进行单线代码自动完成任务实验。我们证明,从我们快速建议中构建的标志比代码x高出了36%,显示了这些建议的质量。此外,我们证明,当我们培训一个预测快速建议书的模式时,我们可以在代码和其他基线上取得显著的绩效收益。我们的工作代码可以在以下找到:://mprogres_ma_progres_ma_comsalsal_sal_squlationsal_sal_ma_ma_sal_past_ma_masal_ma_ma_ma_compal_sation)。