Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.
翻译:经过培训的语文模式在代码检索任务方面取得了大有希望的成功,在代码检索任务中,自然语言文件查询是为了找到最相关的现有代码片段。然而,现有模式仅侧重于优化文档代码配对,将其嵌入隐蔽空间,而没有外部知识的关联。在本文中,我们提议了一代人强化的查询扩展框架。在人类检索程序的启发下,我们利用强大的代码生成模型来绘制答案,以利代码检索任务。具体地说,我们证明,与其只是根据文档查询检索目标代码片断,不如只是利用生成代码模型生成的代码配对器来增加文档查询。我们最了解的是,这是利用代码生成模型加强代码检索任务的第一次尝试。我们在代码SearchNet基准上取得了新的最新成果,大大超过了基线。