Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers. In addition, Deep learning has recently been widely applied to different code-related scenarios, e.g., vulnerability detection, source code summarization. However, automated deep code search is still challenging since it requires a high-level semantic mapping between code and natural language queries. Most existing deep learning-based approaches for code search rely on the sequential text i.e., feeding the program and the query as a flat sequence of tokens to learn the program semantics while the structural information is not fully considered. Furthermore, the widely adopted Graph Neural Networks (GNNs) have proved their effectiveness in learning program semantics, however, they also suffer the problem of capturing the global dependencies in the constructed graph, which limits the model learning capacity. To address these challenges, in this paper, we design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search by jointly learning the rich semantics of both source code and natural language queries. Specifically, we propose to construct graphs for the source code and queries with bidirectional GGNN (BiGGNN) to capture the local structural information of the source code and queries. Furthermore, we enhance BiGGNN by utilizing the multi-head attention module to supplement the global dependencies that BiGGNN missed to improve the model learning capacity. The extensive experiments on Java and Python programming language from the public benchmark CodeSearchNet confirm that GraphSearchNet outperforms current state-of-the-art works.
翻译:代码搜索旨在检索基于自然语言查询的准确代码片段,以提高软件生产率和质量。由于大量可用的程序(如GitHub或Stack Overproduction)对于软件开发者至关重要。此外,深海学习最近被广泛应用于不同的代码相关情景,例如脆弱性检测、源代码合成等。然而,自动深度代码搜索仍然具有挑战性,因为它需要在代码和自然语言查询之间绘制高层次的语义图,而现有的基于深层次的代码搜索方法大多依赖于顺序文字,即将程序与查询作为平坦的代号,在不充分考虑结构信息的情况下学习程序语义。此外,广泛采用的“深层学习”系统在不同的代码相关情景中广泛应用,例如脆弱性检测、源代码拼凑等。但是,由于在构建的图表中捕捉全球依赖性,从而限制模型的学习能力。为了应对这些挑战,我们在本文中设计了一个新的神经网络框架,名为“GreatSearchNet”,用于学习程序语义的数学源代码,以便我们通过直接的源搜索,我们用智能源代码搜索。