项目名称: 基于超图分割的空间数据并行处理任务调度机制研究
项目编号: No.41301411
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 天文学、地球科学
项目作者: 关雪峰
作者单位: 武汉大学
项目金额: 25万元
中文摘要: 采用分布式计算平台来并行处理海量空间数据是当前的一个研究热点。现有并行空间数据处理系统,还没有成熟通用的任务调度模型和调度策略,多直接借用基于有向无环图的调度手段,然而在模型上无法包容海量数据要素,在策略上忽略数据I/O成本,导致并行计算资源无法高效利用。本项目的研究将围绕海量空间数据并行处理的任务调度机制展开,首先利用超图理论建立任务、数据、平台一体化的调度模型,顾及数据的空间邻近性及任务的层次性,以期解决调度模型完整性问题;进而提出实现任务负载均衡及传输数据量最小的调度目标,制定基于超图分割的任务调度策略;同时利用处理算法的局部空间数据参与特征对调度策略进行优化,简化其时间复杂度;最后设计开发分布式的调度原型系统,对上述调度模型及策略进行验证。该调度机制的研究与应用将大大优化空间数据处理过程中I/O传输成本,缩短整体处理时间,提高并行处理效率,实现数据到信息的快速转化。
中文关键词: 并行计算;空间分析;任务调度;超图;
英文摘要: In recent years the improvements of spatial data acquisition technologies resulted in an explosive increase in the volume of spatial data, which brought in unprecedented challenges to current computation capacity. High performance clusters are the only economically viable solution to real-time data processing. Massive spatial data processing contains heavy I/O operations however, and should be characterized as a data-intensive application. The parallelization strategy of data-intensive applications, such as decomposition, scheduling, load-balance, are much different from that of traditional compute-intensive applications. It is of great importance to develop a brand-new scheduling model and strategy for parallel spatial data processing. The research of this proposal is on the scheduling of massive spatial data processing. Firstly, the characteristics of data decomposition and task collection will be evaluated, including task precedence, input dependence, and data transmission. A hypergraph-based scheduling model, which contains data, task and platform, will be constructed.This model can truly represent the whole data processing. Secondly, a task scheduling strategy will be designed from the hypergraph-based scheulding model. The scheduling problem can be resolved by partitioning the constructed hypergraph model
英文关键词: parallel computing;spatial analysis;task scheduling;hypergraph;