来源代码概述的采掘和伐木框架 (An Extractive-and-Abstractive Framework for Source Code Summarization)

(Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques, and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models from the neural machine translation domain. The generated summaries however often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs a task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs a task of abstractive code summarization, which takes in the entire code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques in terms of all three widely used metrics, including BLEU, METEOR, and ROUGH-L.

翻译：代码总和旨在为自然语言形式的指定代码片段自动生成摘要/注释。这种摘要在帮助开发者理解和维护源代码方面发挥着关键作用。现有的代码总和技术可以分为采掘方法和抽象方法。采掘方法利用检索技术从代码片段中提取了一系列重要声明和关键词,并生成一个摘要以保存重要声明和关键词中的事实细节。然而,这种子集可能会错过标识或实体命名,因此,生成摘要的自然性通常很差。抽象方法可以产生类似人写的摘要,利用神经机器翻译领域的编码解码模型。生成的代码总和模型往往缺少重要的事实细节。为生成带有保存的事实细节的像人写的摘要,我们提议一个全新的提取和约束框架框架,以保存重要声明和关键关键关键关键关键语句中的事实细节。这个框架的抽象模块可以产生一个抽象的代码总和类似摘要,利用神经机器翻译模型,用我们精细的精细的精细的精细的精细的精细的精细精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的