Deep code generation is a topic of deep learning for software engineering (DL4SE), which adopts neural models to generate code for the intended functions. Since end-to-end neural methods lack the awareness of domain knowledge and software hierarchy, the results often require manual correction. To systematically explore the potential improvements of code generation, we let it participate in the whole top-down development from intentions to realizations, which is possible in limited scopes. In the process, it benefits from massive samples, features, and knowledge. As the foundation, we suggest building a taxonomy on code data, namely code taxonomy, leveraging the categorization of code information. Moreover, we introduce a three-layer semantic pyramid (SP) to associate text data and code data. It identifies the information of different abstraction levels, and thus introduces the domain knowledge on development and reveals the hierarchy of software. Furthermore, we propose a semantic pyramid framework (SPF) as the approach, focusing on softwares of high modularity and low complexity. SPF divides the code generation process into stages and reserves spots for potential interactions. Eventually, we conceived application scopes for SPF.
翻译:深层代码生成是软件工程(DL4SE)深层学习的主题,它采用神经模型来生成预期功能的代码。由于端到端神经方法缺乏对域知识和软件等级的认识,结果往往需要人工校正。为了系统地探索代码生成的潜在改进,我们让它参与从意图到实现的整个自上而下的发展,这在有限的范围内是可能的。在这个过程中,它受益于大量样本、特征和知识。作为基础,我们建议将代码数据(即代码分类、利用代码信息分类)的分类法建立代码数据分类法。此外,我们引入了三层语义金字塔(SP)来连接文本数据和代码数据。它确定了不同抽象层次的信息,从而引入了关于开发的域知识,并揭示了软件的等级。此外,我们提议以语义金字塔框架(SPF)作为方法,侧重于高模块性和低复杂性的软件。我们建议将代码生成过程分为各个阶段和潜在互动的储备点。我们最后设想了SPFF的应用范围。