Source code summarization aims at generating concise and clear natural language descriptions for programming languages. Well-written code summaries are beneficial for programmers to participate in the software development and maintenance process. To learn the semantic representations of source code, recent efforts focus on incorporating the syntax structure of code into neural networks such as Transformer. Such Transformer-based approaches can better capture the long-range dependencies than other neural networks including Recurrent Neural Networks (RNNs), however, most of them do not consider the structural relative correlations between tokens, e.g., relative positions in Abstract Syntax Trees (ASTs), which is beneficial for code semantics learning. To model the structural dependency, we propose a Structural Relative Position guided Transformer, named SCRIPT. SCRIPT first obtains the structural relative positions between tokens via parsing the ASTs of source code, and then passes them into two types of Transformer encoders. One Transformer directly adjusts the input according to the structural relative distance; and the other Transformer encodes the structural relative positions during computing the self-attention scores. Finally, we stack these two types of Transformer encoders to learn representations of source code. Experimental results show that the proposed SCRIPT outperforms the state-of-the-art methods by at least 1.6%, 1.4% and 2.8% with respect to BLEU, ROUGE-L and METEOR on benchmark datasets, respectively. We further show that how the proposed SCRIPT captures the structural relative dependencies.
翻译:源代码总和旨在为编程语言生成简明明晰的自然语言描述。 完善的代码摘要有利于程序员参与软件开发和维护过程。 要学习源代码的语义表达方式, 最近的努力重点是将代码的语义结构纳入神经网络, 如变异器。 这种基于变异器的方法比其他神经网络, 包括常规神经网络( RNN) 更好地捕捉远程依赖性。 但是, 其中多数不考虑符号之间的结构相对关系, 例如, 简易语系树( ASTs) 的相对位置, 这有助于代码语义学学习。 为了模拟结构依赖性, 我们提议了一个名为 SCRIPT 的结构性相对位置 。 SGIPT首先通过对源代码的 AST 进行分解, 将其传递到两种类型的变异异致变器中。 一个变异器直接根据结构相对距离调整输入; 其它变异器在计算最小的 RIMT( AST) 结构相对位置时, 将结构相对位置定位进一步编码的结构性相对位置进行编码。 我们将这些类型的S- IMER 显示S- greal- 格式的S- gRA 的S