Code retrieval is a common practice for programmers to reuse existing code snippets in the open-source repositories. Given a user query (i.e., a natural language description), code retrieval aims at searching the most relevant ones from a set of code snippets. The main challenge of effective code retrieval lies in mitigating the semantic gap between natural language descriptions and code snippets. With the ever-increasing amount of available open-source code, recent studies resort to neural networks to learn the semantic matching relationships between the two sources. The statement-level dependency information, which highlights the dependency relations among the program statements during the execution, reflects the structural importance of one statement in the code, which is favor-able for accurately capturing the code semantics but has never been explored for the code retrieval task. In this paper, we propose CRaDLe, a novel approach forCodeRtrieval based on statement-levelsemanticDependencyLearning. Specifically, CRaDLe distills code representations through fusing both the dependency and semantic information at the statement level and then learns a unified vector representation for each code and description pair for modeling the matching relationship. Comprehensive experiments and analysis on real-world datasets show that the proposed approach can accurately retrieve code snippets for a given query and significantly outperform the state-of-the-art approaches on the task.
翻译:代码检索是程序员重新使用公开源码库中现有代码片断的常见做法。 根据用户查询(即自然语言描述),代码检索的目的是从一组代码片断中搜索最相关的代码。 有效代码检索的主要挑战在于缩小自然语言描述和代码片断之间的语义差距。 随着可用的开放源码数量不断增加,最近的研究利用神经网络来学习两个来源之间的语义匹配关系。 声明级别的依赖性信息强调程序语句在执行期间的依赖性关系,反映了代码中的一项语句的结构重要性,该语句有利于准确捕获代码语义,但从未为代码检索任务探索过。 在本文中,我们建议CraDLe, odeRtrievar 是一种基于语阶级别的开放源代码学习的新型方法。 具体地说, CRADLe通过在语句层中运用依赖性和语义性信息来保留代码表达方式。 声明级别中的一项语句级说明反映了代码中的一项语句在结构上的重要性, 它对于准确捕获代码,但从未被探讨过。 我们提议CradeRiereal- expeat the real develyal expeal expetional expeactal laction the sal sal deal laction aget the sal