项目名称: 基于分布式计算框架的大图数据描述性查询关键技术研究
项目编号: No.61272156
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 高军
作者单位: 北京大学
项目金额: 82万元
中文摘要: 大图数据及其上应用为数据管理技术带来了巨大的挑战和机遇。利用现有相对成熟的MapReduce分布式计算框架管理大图数据是现实可行的技术方案。本课题针对现有框架下大图数据查询性能难以满足应用需求、用户编写图数据处理脚本繁琐低效等问题,围绕MapReduce框架下大图数据描述性查询,从理论方法与关键技术、原型系统两个层面展开研究工作。在理论方法与关键技术层面,本课题拟提出基于递归Datalog的描述性图查询语言,简化最终用户编写图操作脚本的代价;拟提出基于MapReduce框架的描述性图查询执行计划的构建方法,设计基于代价模型的优化和动态绑定策略;拟提出MapReduce框架中面向循环不变量的缓存策略,设计面向负载平衡的任务自适应分配机制,扩展分布式计算框架对大图数据管理的支持。在原型系统层面,本课题基于分布式计算框架MapReduce的开源系统Hadoop研制大图数据描述性查询的原型系统。
中文关键词: 图查询处理;分布式框架;描述性查询;模式查询;
英文摘要: Big graphs and their applications raise enormous challenges and opportunities to database researchers. It is a practical solution to exploit the existing MapReduce framework in managing big graphs to achieve a high scalability. In order to improve the performance of graph query in the MapReduce framework, and lessen the burdens of end uses in coding and debugging distributed programs, this project plans to study the key techniques of declarative query on big graphs using MapReduce framework. Specifically, this project will design a declarative graph query language based on recursive Datalog to ease the burdens of end users, propose a method to construct query evaluation plans using MapReduce, and devise query optimization and dynamic binding strategies based on a cost model. In addition, this project will study the extensions to the underlying MapReduce framework, including the global caching mechanism for iterative invariance in MapReduce job, and adaptive partitioning strategy for load balance in the reduce side. In addition, this project will build a prototype for the declarative graph query language based on Hadoop system, which is an open source implementation of MapReduce framework.
英文关键词: Graph Processing;Distributed Framework;Declarative Query;Pattern query;