Code summarization (CS) is becoming a promising area in recent language understanding, which aims to generate sensible human language automatically for programming language in the format of source code, serving in the most convenience of programmer developing. It is well known that programming languages are highly structured. Thus previous works attempt to apply structure-based traversal (SBT) or non-sequential models like Tree-LSTM and graph neural network (GNN) to learn structural program semantics. However, it is surprising that incorporating SBT into advanced encoder like Transformer instead of LSTM has been shown no performance gain, which lets GNN become the only rest means modeling such necessary structural clue in source code. To release such inconvenience, we propose structure-induced Transformer, which encodes sequential code inputs with multi-view structural clues in terms of a newly-proposed structure-induced self-attention mechanism. Extensive experiments show that our proposed structure-induced Transformer helps achieve new state-of-the-art results on benchmarks.
翻译:代码总和(CS)正在成为最近语言理解的一个有希望的领域,目的是在源代码格式下自动生成明智的人类语言,用于编程语言,在最方便的程序开发中发挥作用。众所周知,编程语言结构严密。因此,以前的工作试图应用结构基(SBT)或非序列模型,如树-LSTM和图形神经网络(GNN)来学习结构程序语义学。然而,令人惊讶的是,将SBT纳入高级编码器,如变换器而不是LSTM,却没有表现出业绩收益,让GNN成为唯一在源代码中模拟这种必要结构线索的手段。为了释放这种不便,我们提议由结构驱动的变换器,用新的结构驱动自控机制的多视角结构线索编码顺序代码输入。广泛的实验表明,我们拟议的结构驱动变换器有助于在基准上取得新的状态结果。