项目名称: 大规模格结构数据管理关键技术研究
项目编号: No.61462050
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 游进国
作者单位: 昆明理工大学
项目金额: 44万元
中文摘要: 数据立方体格和概念格是数据仓库、数据挖掘、知识发现等领域两类重要的数据模型,其实例都属于格结构数据,然而对大规模格结构数据的存储和查询目前仍然是挑战。本课题视格结构数据为图结构数据,拟以格结构数据的统计特性和规律为切入点,研究格结构数据的产生和解析模型;在此基础上,研究大规模格结构数据的划分方法、分布式存储技术、分布式构造技术和分布式查询技术。利用实验统计方法、复杂网络等经典解析模型、格的概念层次结构建立较完整的格结构数据机理体系,结合当前图划分方法、分布式内存计算技术发展大规模格结构数据管理方法和技术,构建分析平台和数据平台,进行实例验证和分析。研究结果有望对达百万至千万个结点的大规模格结构数据进行高效查询和分析,在格结构数据的特性、模型、划分方法等方面取得较好的理论成果。
中文关键词: 数据管理;格;图划分;分布式内存;大规模
英文摘要: Data cube lattices and concept lattices, are two kinds of important models in data warehousing, data mining and knowledge discovery etc. fields. Their instances are lattice structured. It's still a big challenge on how to storage and query massive lattice structured data. To address this issue, lattice structured data are seen as graph data and its intrinsic statistics and laws are firstly studied. Then the model and mechanism are discussed. Based on these hypotheses, partitioning, storaging and querying across mutiple nodes are designed. Test statistics, complex network etc. classic models and concept hierarchies are employed to build the mechanism of lattice structured data. Graph partitioning, distributed memory computing are also leveraged to develop large scale lattice structured data management. Corresponding analysis platform and open data platform are constructed and some sample application data sets are selected to demostrate the theory. Massive lattice structured data of one to ten millon nodes are expected to be queried and analyzed efficiently. Better theoretic results may be achived in characteristics, models and partioning methods of lattice structured data.
英文关键词: data management;lattice;graph partitioning;distributed memory;large scale