Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating accurate code summaries, among which Transformer-based approaches have achieved promising performance. However, effectively integrating the code structure information into the Transformer is under-explored in this task domain. In this paper, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, we inject the local symbolic information (e.g., code tokens and statements) and global syntactic structure (e.g., data flow graph) into the self-attention module of Transformer as inductive bias. To further capture the hierarchical characteristics of code, the local information and global structure are designed to distribute in the attention heads of lower layers and high layers of Transformer. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches. Compared with the best-performing baseline, SG-Trans still improves 1.4% and 2.0% in terms of METEOR score, a metric widely used for measuring generation quality, respectively on two benchmark datasets.
翻译:代码摘要有助于开发者理解程序,并缩短其在软件维护期间对程序功能进行推断的时间。最近的努力采用深层次的学习技术,如生成准确代码摘要的顺序到顺序模型,其中以变换者为基础的方法取得了有希望的性能。然而,将代码结构信息有效纳入变换器的工作领域探索不足。在本文件中,我们提出了一个名为SG-Trans的新办法,将代码结构属性纳入变换器。具体地说,我们把当地象征性信息(例如代码符号和语句)和全球合成结构(例如数据流图)输入变换器自我注意模块,作为暗示性偏差。为了进一步捕捉到代码的等级特征,本地信息和全球结构旨在向下层和高层的用户传播。广泛的评价表明SG-Trans相对于最新方法的优异性表现。与最佳基准相比,SG-Trans(例如代码符号和语句)和全球合成结构(例如数据流图)仍然在METEOR评分中提高1.4%和2.0%,这是分别用于衡量生成质量的两种基准。