Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.
翻译:近些年来,人们广泛研究了代码与深层学习的代码总和。目前,代码总和的深学习模式通常遵循神经机翻译的原则,并采用了编码解码框架,在此,编码员从源代码代码中学习语义表达方式,而解码器则将所学的表达方式转换成描述代码片功能的人类可读文本。尽管它们达到了新的最新状态性能,但我们注意到,目前的模型往往不是生成流畅摘要,就是无法捕捉核心功能,因为它们通常侧重于单一类型的代码表达形式。因此,我们提出了一种新的深层次学习模式,即GypSum,这是一种利用图形注意神经网络学习混合表达方式的新的深层次学习模式,以及一个经过预先训练的编程和自然语言模式。我们引入了与代码片断控制流相关的特定边缘,用于图形构造的抽象语法树,并设计了两个编码,分别从图形和源代码象征性序列中学习。我们修改了变换器解码子的子子子组,因为它们通常侧重于单一类型的代码表达形式。我们提议了G-imal-imal assimal imal imational imational resulation imational laction aminpeal doptionalmentalmental laction laction aminduduction ladestrualmental sutional lactions