改进代码摘要, 使用块速摘要语法树分割法 (Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting)

Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits software development and maintenance. Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries. However, existing AST based methods suffer from the difficulty of training and generate inadequate code summaries. In this paper, we present the Block-wise Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes the rich tree-form syntax structure in ASTs, for improving code summarization. BASTS splits the code of a method based on the blocks in the dominator tree of the Control Flow Graph, and generates a split AST for each code split. Each split AST is then modeled by a Tree-LSTM using a pre-training strategy to capture local non-linear syntax encoding. The learned syntax encoding is combined with code encoding, and fed into Transformer to generate high-quality code summaries. Comprehensive experiments on benchmarks have demonstrated that BASTS significantly outperforms state-of-the-art approaches in terms of various evaluation metrics. To facilitate reproducibility, our implementation is available at https://github.com/XMUDM/BASTS.

翻译：自动代码总和使软件开发者摆脱了手工评论的沉重负担,并有利于软件开发和维护。描述源代码组合结构的简易语法树(AST)已经纳入,以指导代码摘要的生成。但是,基于 AST 的现有方法在培训方面遇到困难,并生成了不充分的代码摘要。在本文中,我们介绍了块状简易语法树分解法(BASTS),它充分利用了ASTs中丰富的树形合成结构来改进代码总和。 BASTS根据控制流程图的顶层分割了方法的代码,为每个代码分解生成了分裂的AST。每种基于代码的分解法都采用树型- LSTM模型,使用培训前的战略来捕捉本地非线性语法编码。学到的语法编码与代码编码相结合,并被输入到变异器中,以生成高质量的代码摘要。 BASTSTS大大超越了控制流程图图图图图图图中各块块的方形法的代码代码代码代码代码代码代码代码的代码代码代码。 BASTSDTSDSD的代码分解码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码代码,, 生成生成生成生成生成生成生成生成生成为每个的代码分。每个代码都生成产生分裂。每个代码分。每个代码都是分。每个代码分。每个代码都使用模型模型模型模型模型模型模型模型模型模型模型都使用模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型。,然后用模型模型模型模型模型模型模型模型模型模型模型。,然后用模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型,,,,,然后用模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型,,,,,,,