Background: Code summarization automatically generates the corresponding natural language descriptions according to the input code. Comprehensiveness of code representation is critical to code summarization task. However, most existing approaches typically use coarse-grained fusion methods to integrate multi-modal features. They generally represent different modalities of a piece of code, such as an Abstract Syntax Tree (AST) and a token sequence, as two embeddings and then fuse the two ones at the AST/code levels. Such a coarse integration makes it difficult to learn the correlations between fine-grained code elements across modalities effectively. Aims: This study intends to improve the model's prediction performance for high-quality code summarization by accurately aligning and fully fusing semantic and syntactic structure information of source code at node/token levels. Method: This paper proposes a Multi-Modal Fine-grained Feature Fusion approach (MMF3) for neural code summarization. We introduce a novel fine-grained fusion method, which allows fine-grained fusion of multiple code modalities at the token and node levels. Specifically, we use this method to fuse information from both token and AST modalities and apply the fused features to code summarization. Results: We conduct experiments on one Java and one Python datasets, and evaluate generated summaries using four metrics. The results show that: 1) the performance of our model outperforms the current state-of-the-art models, and 2) the ablation experiments show that our proposed fine-grained fusion method can effectively improve the accuracy of generated summaries. Conclusion: MMF3 can mine the relationships between crossmodal elements and perform accurate fine-grained element-level alignment fusion accordingly. As a result, more clues can be provided to improve the accuracy of the generated code summaries.
翻译:背景 : 代码总和自动根据输入代码生成相应的自然语言描述 。 代码表示的全面性对于代码总和任务至关重要 。 但是, 大多数现有方法通常使用粗化的混合法来整合多式特性。 它们一般代表一个代码的各种不同模式, 例如“ 抽象语法树” 和“ 符号序列 ”, 因为它有两个嵌入器, 然后在 AST/ 代码级别将两个代码连接起来。 这样粗糙的整合使得难以有效地了解各个模式中精细的代码元素之间的关联性关系 。 目标 : 本项研究打算通过精确的对齐和完全使用精密混合混合法的混合法组合方法, 来改进模型的准确性能 。 我们用多模块精密的精密性能和精密性能结构 。 我们用这个方法来生成神经代码模型的精细化组合法 。 我们用这个方法可以对当前代码的精确性能进行精细化的合并法 。 我们用这个方法来显示一个符号和节制的精确度 。