Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.
翻译:软件开发商在软件开发过程中写了很多源代码和文件。 本质上, 开发商经常回忆他们过去在使用软件或记录软件时所写的源代码或代码摘要的部分内容。 为了模仿开发商的代码或简易生成行为, 我们提议了一个检索强化框架, REDCODER, 从检索数据库中检索相关的代码或摘要, 并将其作为代码生成或概要化模型的补充。 REDCODER 有一些独特性。 首先, 它扩大了最先进的密集检索技术, 以搜索相关的代码或摘要。 其次, 它可以与检索数据库合作, 其中包括单式( 仅代码或自然语言描述) 或双式实例( 代码描述对配对 ) 。 我们实验和广泛分析在爪哇 和 Python 的代码生成和合成两个基准数据集, 并且有希望的结果认可了我们提议的检索增强框架的有效性 。