Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers. In addition, Deep learning has recently been widely applied to different code-related scenarios, e.g., vulnerability detection, source code summarization. However, automated deep code search is still challenging since it requires a high-level semantic mapping between code and natural language queries. Most existing deep learning-based approaches for code search rely on the sequential text i.e., feeding the program and the query as a flat sequence of tokens to learn the program semantics while the structural information is not fully considered. Furthermore, the widely adopted Graph Neural Networks (GNNs) have proved their effectiveness in learning program semantics, however, they also suffer the problem of capturing the global dependencies in the constructed graph, which limits the model learning capacity. To address these challenges, in this paper, we design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search by jointly learning the rich semantics of both source code and natural language queries. Specifically, we propose to construct graphs for the source code and queries with bidirectional GGNN (BiGGNN) to capture the local structural information of the source code and queries. Furthermore, we enhance BiGGNN by utilizing the multi-head attention module to supplement the global dependencies that BiGGNN missed to improve the model learning capacity.
翻译:代码搜索旨在检索基于自然语言查询的准确代码片段,以提高软件生产率和质量。由于大量可用的程序(如GitHub或Stack Overproduction)对于软件开发者至关重要。此外,深层次学习最近被广泛应用于不同的代码相关情景,例如脆弱性检测、源代码总和等。然而,自动深层代码搜索仍然具有挑战性,因为它需要在代码和自然语言查询之间绘制高层次的语义图。大多数现有的基于深层次学习的代码搜索方法依赖于顺序文字,即将程序与查询作为平坦的标志序列,在不充分考虑结构信息的情况下学习程序语义学。此外,广泛采用的图表神经网络(GNNNPs)在学习程序语义特征方面证明了其有效性。但是,由于在构建的图表中捕获全球依赖性,从而限制了模型的学习能力。为了应对这些挑战,我们设计了一个名为“Greagal Search Net” 的新的神经网络框架,以利用结构代码搜索来提高我们内部的源代码,从而通过直截图的源搜索,从而提升内部的源搜索。