Map/Reduce数据处理平台中内存级数据缓存技术研究

项目名称： Map/Reduce数据处理平台中内存级数据缓存技术研究

项目编号： No.61202075

项目类型： 青年科学基金项目

立项/批准年度： 2013

项目学科： 计算机科学学科

项目作者： 梁毅

作者单位： 北京工业大学

项目金额： 23万元

中文摘要： Map/Reduce数据处理平台是数据中心海量数据处理领域的最新技术进展。降低应用运行时海量数据读取开销，提高应用执行效率，是确保Map/Reduce平台服务质量的关键。内存级数据缓存技术是数据中心提升数据访问效率的一类典型技术。然而，既有的数据缓存研究成果难以适应Map/Reduce平台数据基于计算节点分布存储以及数据本地化处理的新特征，而针对Map/Reduce平台的数据缓存研究尚属空白。本项目拟发展面向Map/Reduce数据处理平台的内存级数据缓存技术；以提升应用执行效率为目标，针对Map/Reduce平台新的数据存储与处理模式，着重对数据访问特征分析方法、数据预取与替换、数据重放置以及缓存感知的Map/Reduce任务调度等关键技术展开研究，并通过原型系统对研究成果进行分析和验证，为在Map/Reduce平台引入内存级数据缓存提供切实可行的理论基础和技术方案。

中文关键词： Map/Reduce；数据缓存；负载分析；替换策略；作业调度

英文摘要： Map/Reduce is on the cutting edge of the massive data processing framework in large-scale data centers. Reducing the I/O performance overhead in the massive data processing is essential to achieve the better execution efficiency of Map/Reduce applications, and hence, the higher quality of service of data centers. In-memory data caching is one of the popular technologies to improve data access rate via reducing the disk I/O in data centers. However, on applied to the Map/Reduce-styled framework, the existing in-memory data caching technology cannot accommodate to the framework's new features including that massive data distributed among computing nodes and computation followed with the data locality. Aiming on this issue, we focus on the research of the adaption and extension of the in-memory data caching technology to the Map/Reduce-styled framework, which is, to our best knowledge, the original work in the field of Map/Reduce framework research. The main research topics include the two-leveled data access characteristic analysis, the data perfecting and replacement, the recovery-cost-oriented data placement and the data caching-aware task scheduling, which constitute an integrated solution for the in-memory data caching in Map/Reduce framework. Along with the in-depth research, a prototype system of Map/Reduce

英文关键词： Map/Reduce；data caching；workload analysis；replacement strategy；job scheduling

成为VIP会员查看完整内容