项目名称: 海量数据处理中面向任务加速的数据调度策略研究
项目编号: No.61300033
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 任祖杰
作者单位: 杭州电子科技大学
项目金额: 27万元
中文摘要: 目前,在互联网应用、金融电信、医疗健康等诸多领域,数据量正在急剧膨胀。为挖掘这些庞大数据潜在的科学或商业价值,需要依赖高效的海量数据处理系统。优化任务调度与数据调度成为提升海量数据处理系统性能的两个重要手段。传统数据调度关注于数据存放、迁移,复制及副本管理,用于提高存储资源利用率及数据访问服务质量。这类调度操作并非针对任务执行过程,因而对任务执行中的数据I/O优化存在响应度不足的局限性。本项目针对海量数据处理任务复杂的计算过程,以降低数据I/O开销、加快任务执行效率为目标,研究面向任务加速的数据调度策略,涵盖数据智能预取、数据协同传输、数据均衡分发等调度策略。面向任务加速的数据调度策略克服传统数据调度的局限性,充分降低任务执行中数据I/O开销,实现计算节点、存储节点间的高效数据调度,对海量数据处理性能提升有重大意义。
中文关键词: 海量数据处理;数据调度;数据预取;数据传输;数据分发
英文摘要: With the rapid growth of data volume in many fields such as Internet application, financial telecoms and health care, high-performance massive data processing techniques are required to deal with such big data.Task scheduling and data scheduling optimizations are proved to be effective solutions for upgrading the performance of massive data processing system. Traditional data scheduling focuses on data storage,transfer,copy and replication management, aiming to improve the utilization ratio of storage resource and data access QoS, instead of directly serving the process of task execution. This proposal revisits data scheduling issue from the perspective of task acceleration, and explores the strategies of data prefetch, parallel transfer and distribution for task execution procedure on a massive data processing framework. Data scheduling for task acceleration overcomes the limitation of lack-responsivity to task execution of traditional data scheduling, degrades data I/O cost during the task execution, implements a high-efficiency data scheduling between computation nodes and storage nodes, thereby improving the massive data process system performance.
英文关键词: Massive Data Processsing;Data Scheduling;Data Prefetch;Data Transfer;Data Distribution