Transformer architectures have been successfully used in learning source code representations. The fusion between a graph representation like Abstract Syntax Tree (AST) and a source code sequence makes the use of current approaches computationally intractable for large input sequence lengths. Source code can have long-range dependencies that require larger sequence lengths to model effectively. Current approaches have a quadratic growth in computational and memory costs with respect to the sequence length. Using such models in practical scenarios is difficult. In this work, we propose the conditioning of a source code snippet with its graph modality by using the graph adjacency matrix as an attention mask for a sparse self-attention mechanism and the use of a graph diffusion mechanism to model longer-range token dependencies. Our model reaches state-of-the-art results in BLEU, METEOR, and ROUGE-L metrics for the code summarization task and near state-of-the-art accuracy in the variable misuse task. The memory use and inference time of our model have linear growth with respect to the input sequence length as compared to the quadratic growth of previous works.
翻译:在学习源代码表达方式中成功使用了变换器结构。 图形表达式( 如 Empletic Street (AST) 和源代码序列之间的混合使当前方法的使用在计算上难以对大输入序列长度使用。 源代码可以具有长期依赖性, 需要更大型序列长度才能有效建模。 当前方法在计算和记忆成本与序列长度方面有二次增长。 在实际情况下, 使用这些模型是困难的。 在这项工作中, 我们提议对源代码片及其图形模式进行调试, 其方法是使用图形对称矩阵作为分散自我注意机制的注意掩码, 并使用图形扩散机制来模拟远程符号依赖。 我们的模型在 BLEU、 METEOR 和 ROUGE- L 度中达到了代码和总和任务以及变量误用任务中近状态精确度的状态。 我们模型的记忆使用和推导时间在输入序列长度方面有线性增长, 与前几轮工程增长相比, 。