项目名称: 基于数据共享的高并发图计算系统及核心技术研究
项目编号: No.61472009
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 代亚非
作者单位: 北京大学
项目金额: 80万元
中文摘要: 图数据是大数据的重要数据类型之一,图处理技术是当前的研究热点,出现了以Pregel, Giraph,GraphLab,GraphX等为代表的图处理的系统。这些系统大多基于面向任务的处理模式:将图计算分解成一个个相互独立的任务来完成,每个任务中计算程序和数据紧密耦合。这样的模式在并发度不高的情况下,收到了预期的效果。但是,随着应用的不断扩展,要求并发处理的任务越来越多,数据和计算绑定的模式则遇到性能瓶颈。由于面向任务的模式,不支持共享数据,每个任务都需导入各自所需的数据,往往造成冗余数据占据内存,消耗巨大,并发执行的任务极其有限,严重阻碍了图处理系统性能的提升。 本申请提出一种新型的面向数据的图计算模式,以支持图数据共享为基础,目的是有效使用内存,支持高并发的任务执行,从而从整体上提高图计算的效率。本项目将对支持高并发图计算系统的图数据管理、流式计算模型、执行机制和技术展开详细研究。
中文关键词: 并行计算;图计算;大数据处理;并发控制;数据共享
英文摘要: Graph data is one of the typical type of the big data and graph computing has become a research hotspot in present. There have been many graph computing systems which are represented by Pregel, Giraph,GraphLab,GraphX and so on. The processing mode adopted by these systems is task oriented, in which,a graph computing procedure is divided into a series individual tasks in which the processing procedure and data are tightly coupled together. Such mode woks well with lower concurrency. However with the various applications continue to expand, more and more tasks need to be processed concurrently. In this case the coupling of procedure and data become the bottle neck for efficiency. Because the task oriented model does not support the data sharing, each task has to store their graph data in memory, which causes the redundant graph data exhausting the memory which extremely limits the concurrency of tasks processing, thereby heavily impede the improvement of performance of graph computing systems. In this proposal, we present a novel data oriented graph computing model which based on the data sharing technology to achieve efficient utilization of memory to support high concurrency task processing and to increase the whole efficiency of graph computing systems. Enclosing the aim, we will deeply study related the data manager method, stream computing model, execution mechanism and related core technologies.
英文关键词: parallel computing;grapgh computing;big data processing;concurrency control;data sharing