神经源代码元源代码组合模型 (Ensemble Models for Neural Source Code Summarization of Subroutines)

A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of writing these summaries. At present, most state-of-the-art approaches for code summarization are neural network-based solutions akin to seq2seq, graph2seq, and other encoder-decoder architectures. The input to the encoder is source code, while the decoder helps predict the natural language summary. While these models tend to be similar in structure, evidence is emerging that different models make different contributions to prediction quality -- differences in model performance are orthogonal and complementary rather than uniform over the entire dataset. In this paper, we explore the orthogonal nature of different neural code summarization approaches and propose ensemble models to exploit this orthogonality for better overall performance. We demonstrate that a simple ensemble strategy boosts performance by up to 14.8%, and provide an explanation for this boost. The takeaway from this work is that a relatively small change to the inference procedure in most neural code summarization techniques leads to outsized improvements in prediction quality.

翻译：子例程的源代码摘要是该子例程的简单描述。摘要是程序员所消费的大多数文件, 例如 JavaDocs 的方法摘要。源代码总和是写这些摘要的任务。目前, 代码总和的大多数最先进的方法都是神经网络解决方案, 类似于后代、图2Seq 和其他编码解码结构。对编码器的输入是源代码, 而解码器有助于预测自然语言摘要。虽然这些模型在结构上往往相似, 但正在出现不同模型对预测质量做出不同贡献的证据 -- -- 模型性能差异是或纵向的,而且对整个数据集来说是互补的,而不是统一的。在本文件中,我们探索不同神经代码总和图解的神经网络解决方案的或多变性质, 并提出利用这种或多变的模型来提高总体性能。我们证明简单的编码战略可以将性能提升到14.8%, 并且为这种推力提供了解释。从模型到神经化中, 多数质量技术的改进是相对小的。从这个预测过程到微的改进过程。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日