The problem of code generation from textual program descriptions has long been viewed as a grand challenge in software engineering. In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description. However, the existing approaches ignore the global relationships among API methods, which are important for understanding the usage of APIs. In this paper, we propose to model the dependencies among API methods as an API dependency graph (ADG) and incorporate the graph embedding into a sequence-to-sequence (Seq2Seq) model. In addition to the existing encoder-decoder structure, a new module named ``embedder" is introduced. In this way, the decoder can utilize both global structural dependencies and textual program description to predict the target code. We conduct extensive code generation experiments on three public datasets and in two programming languages (Python and Java). Our proposed approach, called ADG-Seq2Seq, yields significant improvements over existing state-of-the-art methods and maintains its performance as the length of the target code increases. Extensive ablation tests show that the proposed ADG embedding is effective and outperforms the baselines.
翻译:从文字程序描述生成代码的问题长期以来一直被视为软件工程的巨大挑战。近年来,提出了许多深深学习基础方法,这些方法可以产生从文字程序描述序列序列生成的代码序列序列。然而,现有方法忽视了API方法之间的全球关系,而这些关系对于理解API的使用非常重要。在本文件中,我们提议将API方法之间的依赖性作为API依赖性图表(ADG)的模式模型,并将图表嵌入到一个序列到序列(Seq2Seq2Seq)模型中。除了现有的 incoder-decoder(Seq2Seq) 结构外,还引入了一个新的模块“empedder” 。在这样的情况下,DCer可以使用全球结构依赖性和文本程序描述两种方法之间的全球关系,这对理解API方法的使用非常重要。我们在三个公共数据集和两种编程语言(Python和Java)上进行广泛的代码生成实验。我们提议的方法叫做ADG-Seq2Seq,使现有状态方法有了重大改进,并保持其性表现为嵌底值的长度。