Source code summarization is the task of generating a high-level natural language description for a segment of programming language code. Current neural models for the task differ in their architecture and the aspects of code they consider. In this paper, we show that three SOTA models for code summarization work well on largely disjoint subsets of a large code-base. This complementarity motivates model combination: We propose three meta-models that select the best candidate summary for a given code segment. The two neural models improve significantly over the performance of the best individual model, obtaining an improvement of 2.1 BLEU points on a dataset of code segments where at least one of the individual models obtains a non-zero BLEU.
翻译:源代码总和是为部分编程语言代码生成高层次自然语言描述的任务。当前任务神经模型的结构及其考虑的代码方面各不相同。在本文中,我们显示,三个代码总和SOTA模型对大代码库中基本上脱节的子集非常有效。这种互补性促使模型组合:我们建议三个元模型为某个代码部分选择最佳候选摘要。两个神经模型比最佳个人模型的性能大有改进,在代码部分数据集上改进了2.1 BLEU点,其中至少有一个单个模型获得非零BLEU。