Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have achieved great success in many downstream code-related tasks. Since the memory and computational complexity of self-attention in the Transformer grow quadratically with the sequence length, PPLMs typically limit the code length to 512. However, codes in real-world applications are generally long, such as code searches, which cannot be processed efficiently by existing PPLMs. To solve this problem, in this paper, we present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks. The key components in SASA are top-$k$ sparse attention and Abstract Syntax Tree (AST)-based structure-aware attention. With top-$k$ sparse attention, the most crucial attention relation can be obtained with a lower computational cost. As the code structure represents the logic of the code statements, which is a complement to the code sequence characteristics, we further introduce AST structures into attention. Extensive experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.
翻译:以编程为基础的预先培训语言模型(PPLM),如 codeBERT 等,在许多与代码相关的下游任务中取得了巨大成功。由于变形器中自留的记忆和计算复杂性随着序列长度的跨度增长,PPLMs通常将代码长度限制在512。然而,现实应用中的代码通常很长,例如代码搜索,而现有的PPLMS无法有效处理。为了解决这个问题,我们在本文件中介绍了一个结构-软件分散注意机制SASA,这是一个结构-软件分散注意机制,可以降低复杂程度,改进长期代码理解任务的业绩。在对代码理解任务的广泛实验中,SASA的主要组成部分是最高-千瓦的注意力稀少和基于简易语法树(AST)的结构注意。在最高-千瓦的注意下,最关键的关注关系可以用较低的计算成本获得。由于代码结构代表代码说明的逻辑,是对代码序列特性的补充,我们进一步引入了AST结构。关于代码XGLUE任务的广泛实验显示SA比相竞量基线的绩效更好。