项目名称: 基于RDF的软件工程数据存储与检索技术研究
项目编号: No.60803022
项目类型: 青年科学基金项目
立项/批准年度: 2009
项目学科: 金属学与金属工艺
项目作者: 宫学庆
作者单位: 华东师范大学
项目金额: 18万元
中文摘要: 本项目的研究工作关注于基于RDF的软件工程数据管理。项目组成员搜集和整理了各种类型的软件工程数据,并对这些数据进行了语义标注和抽取处理,通过与公开的语义数据集DBpedia进行整合,形成了一个海量的RDF数据集。基于这个数据集,项目组成员完成了以下的研究工作: 1、设计了基于OWL的软件工程数据描述模型,该模型不仅能够对源代码、需求、测试、版本和缺陷数据进行描述,同时还对这些数据之间的语义关联进行了描述; 2、提出了基于Hash技术的RDF数据存储和查询解决方案,将RDF三元组解析后存储在关系表中,并采用Hash方法将每个节点所对应的入边和出边保存成一个二进制向量,在查询时利用Hash技术对查询图中的每个非页节点进行快速定位,从而提高了检索效率; 3、实现了一个基于集群计算的分布式RDF数据处理引擎,支持对海量RDF数据的存储和查询。该系统取得了软件著作权证书。 4、通过对大型软件系统开发过程中的缺陷报告数据进行管理和分析,验证了本项目研究工作的有效性。 综上所述,本项目的研究目标明确,研究工作进展顺利,项目管理符合相关规定,研究成果达到了项目任务书所列研究目标的要求。
中文关键词: 软件工程数据;RDF/OWL;数据建模;查询处理;
英文摘要: This project focused on RDF based software engineering data management. We crawled and collected data from the web and software development company. After the preprocessing such as information extraction and semantic annotation, we integrated these data with the public semantic dataset Dbpedia to form a massive RDF dataset. Base on this dataset, we complete the following works: 1) Design a OWL model for software engineering data, which could describe not only the source code, requirements, testing, version information, and bug reports, but also the semantic association among these data; 2) Propose a hash-based solution for storing and querying RDF data. First, we parse and split the RDF triples into several relational tables, then the in edges and out edges of each node are mapping to a binary vector via a hash function. During the query processing, we can locate each internal node of the query graph quickly by hashing. 3) Implement a cluster-based distributed RDF data processing engine, which supports the management of massive RDF data. This system has gotten the software copyright certificate. 4) According to the bug information analysis during the process of a large software system development, we verify the effectiveness of our research works. In summary, the research objective of this project is clear, and our research works meet the requirements of the project mission statement.
英文关键词: Software Engineering Data; RDF/OWL; Data Modeling; Query Processing