Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from searching snippet codes to function codes. In this paper, we introduce Augmented Code (AugmentedCode) retrieval which takes advantage of existing information within the code and constructs augmented programming language to improve the code retrieval models' performance. We curated a large corpus of Python and showcased the the framework and the results of augmented programming language which outperforms on CodeSearchNet and CodeBERT with a Mean Reciprocal Rank (MRR) of 0.73 and 0.96, respectively. The outperformed fine-tuned augmented code retrieval model is published in HuggingFace at https://huggingface.co/Fujitsu/AugCode and a demonstration video is available at: https://youtu.be/mnZrUTANjGs .
翻译:代码检索允许软件工程师通过自然语言查询搜索代码,该查询既依靠自然语言处理技术,又依靠软件工程技术。在从搜索片断代码到功能代码的代码检索方面曾几次尝试过几次尝试。我们在本文件中引入了强化代码(AugmentedCode)检索,利用代码中的现有信息,并构建了强化的编程语言来改进代码检索模型的性能。我们制作了大量的Python软件,展示了框架和强化编程语言的结果,这些语言在代码SearchNet和代码BERT上分别优于0.73和0.96的对等正值(MRRR),完成过完善的扩展代码检索模型在HuggingFace上发表,见https://huggingface.co/Fujitsu/AugCode,一个演示视频可在https://youtu.be/mnZrUTANjGs上查阅。