通过混合 GNN 获取代码摘要代代 (Retrieval-Augmented Generation for Code Summarization via Hybrid GNN)

Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural language summaries. Most previous approaches either rely on retrieval-based (which can take advantage of similar examples seen from the retrieval database, but have low generalization performance) or generation-based methods (which have better generalization performance, but cannot take advantage of similar examples). This paper proposes a novel retrieval-augmented mechanism to combine the benefits of both worlds. Furthermore, to mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph to complement the static graph representation of the source code, and design a hybrid message passing GNN for capturing both the local and global structural information. To evaluate the proposed approach, we release a new challenging benchmark, crawled from diversified large-scale open-source C projects (total 95k+ unique functions in the dataset). Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.

翻译：源代码总和旨在从结构化代码片段生成自然语言摘要,以更好地了解代码功能。然而,由于源代码的复杂性以及源代码与自然语言摘要之间的语言差异,自动代码总和具有挑战性。大多数先前的做法要么依靠基于检索的方法(这可以利用检索数据库的类似实例,但一般化性能较低),要么依靠基于生成的方法(这可以利用从检索数据库看到的类似实例,但一般化性能较好,但不能利用类似的例子)。本文提议了一个新的检索强化机制,将两个世界的惠益结合起来。此外,为了减少图形神经网络在获取源代码全球图形结构信息方面的局限性,我们提出了一个新的基于关注的动态图表,以补充源代码的静态图形表述,并设计通过GNN的混合信息,以获取本地和全球结构信息。为了评估拟议方法,我们发布了一个新的具有挑战性的基准,从多样化的大型开源C项目(数据集总共95k+独特的功能)中爬过。此外,为了减少图形神经网络(GNNN)在获取源代码的全球图形结构信息信息结构信息信息方面的局限性,我们提出了一个新的基于关注动态图表的动态图表图表图表图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

83+阅读 · 2021年6月14日

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

19+阅读 · 2021年4月21日

【AAAI2021】数据增强图神经网络

专知会员服务

107+阅读 · 2020年12月21日

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

38+阅读 · 2020年11月20日