项目名称: 面向功能挖掘的代码主题建模技术研究
项目编号: No.61472007
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 赵俊峰
作者单位: 北京大学
项目金额: 80万元
中文摘要: 代码复用中,开发人员需要理解软件的功能及其代码实现。近年来,基于主题建模技术的代码理解方法成为研究热点之一。现有工作大多将代码作为普通文本,直接利用针对普通文本的主题建模技术,缺少对代码特点的考虑;并且挖掘出的主题语义不明确,多种类型主题混杂在一起,开发人员难以理解与应用。 本项目以功能性主题为核心,研究面向功能挖掘的代码主题建模技术。首先,在针对普通文本的主题建模技术基础上,结合软件代码静态结构与动态行为特点进行改进,提出适合代码的主题建模技术,并研究区分不同类型主题及识别功能性主题的技术;进而,构建主题及其关联实体描述模型,在此基础上研究描述功能性主题语义的技术,并建立主题-主题之间、主题-关联实体之间关联关系。最后,研究基于主题的代码理解、软件分类、领域分析等应用技术,并研制相应的原型系统,利用开源软件数据和企业实践进行技术验证。
中文关键词: 代码理解;代码复用;主题建模
英文摘要: Developers need to comprehend the functional concerns of a software system and the corresponding implementations in source code, before they reuse the source code. Recently, topic modeling-based source code comprehension has become one of the research hotspots. Most previous approaches take source code as plain text written in natural languages, and reuse the topic modeling techniques designed for plain text, which neglect the distinct characteristics of source code. It is difficult to determine the semantics of the topics mined from source code, and there are different categories of topics mixed together. Consequently, it is difficult for the developers to comprehend and apply the topics. In this project, we conduct research on topic modeling techniques for mining functional concerns from source code, which is focused on functional topics. Firstly, based on topic modeling techniques for plain text, we carry out improvements utilizing the characteristics of source code to construct new topic modeling techniques that are more suitable for source code, and we conduct research on categorizing topics mined from source code and identifying functional concerns. Then, we propose a new model to describe the topics and its associated artifacts. Based on the model, we further research specific techniques for describing the semantics of functional topics and establishing relationships among topics and its associated artifacts. Finally, we study mechanisms for developers to apply topics in software comprehension, software categorization, domain analysis, etc., implement the prototype system, and evaluate the effectiveness of our approach with open source software data and enterprise practices.
英文关键词: source code comprehension;source code reuse;topic modeling