In recent years, the rise of deep learning and automation requirements in the software industry has elevated Intelligent Software Engineering to new heights. The number of approaches and applications in code understanding is growing, with deep learning techniques being used in many of them to better capture the information in code data. In this survey, we present a comprehensive overview of the structures formed from code data. We categorize the models for understanding code in recent years into two groups: sequence-based and graph-based models, further make a summary and comparison of them. We also introduce metrics, datasets and the downstream tasks. Finally, we make some suggestions for future research in structural code understanding field.
翻译:近年来,软件行业深层学习和自动化要求的兴起使智能软件工程提升到新的高度,理解代码的方法和应用数量正在增加,其中许多方法和应用正在被用于在代码数据中更好地收集信息。在这次调查中,我们全面概述了代码数据形成的结构。我们将近年来的理解代码模型分为两类:基于序列和基于图表的模式,进一步对它们进行总结和比较。我们还引入了衡量标准、数据集和下游任务。最后,我们提出了一些关于未来在结构代码理解领域开展研究的建议。