支持高吞吐量大数据处理的数据复制策略研究

项目名称： 支持高吞吐量大数据处理的数据复制策略研究

项目编号： No.61303054

项目类型： 青年科学基金项目

立项/批准年度： 2014

项目学科： 自动化技术、计算机技术

项目作者： 朱妤晴

作者单位： 中国科学院计算技术研究所

项目金额： 27万元

中文摘要： 本课题针对数据复制策略如何支持大数据处理的高吞吐量问题进行研究。大数据正促成社会、甚至是科学研究方式的变革。大数据处理既有数据海量性，也有处理操作复杂性。高吞吐量大数据处理系统支持对大数据处理操作和处理数据的高吞吐量，使过去复杂难解的问题变得可解，也使求解结果可及时为人所用。数据复制技术是提高可支持大数据处理的大规模系统吞吐量的有效手段之一，同时也是保证大规模系统可用性的核心机制。本课题在抽象大数据处理应用公共操作集的基础上，构建大数据处理操作/数据吞吐量与数据复制策略间的关系，求解吞吐量最优化的复制策略属性参数，从而设计实现支持大数据处理高吞吐量的复制策略。课题从大数据处理系统吞吐量优化的角度入手，预期将形成一套可支持大数据处理高吞吐量的动态复制策略设计和实现方案，在复制策略方面为高性能大数据处理系统的设计和构造提供新思路和技术基础。

中文关键词： 大数据系统；分布式提交；自动配置；性能调优；大数据操作集

英文摘要： This proposal targets at devising and implementing a replication strategy to support high-throughput Big Data processing. Big Data is now promoting the revolution of society and even the scientific research methodology. Big Data processing is featured by the high volume of data, as well as the complexity and the variety of processing methods. High-throughput Big Data processing system promises a high throughput of Big Data processing operations and the processed data volume. The high througput of operations and data enables the solution of complex problems that can not be solved in traditional systems, and the output of timely solutions. Replication is not only an effective measure to increase throughputs, but also the core mechanism to guarantee availability in the large-scale system. To fulfill the task of designing a replication strategy that supports the high-throughput Big Data processing, the proposal studies first how to abstract the common operations for Big Data processing techniques, then how to construct a model correlating the throughput of the executed operations (as well as the processed data volume) and the features of a replication strategy, and finally how to compute the feature values that optimizes the throughput in the model. This project focuses the limited research resources on the restrict

英文关键词： big data system；distributed commit；automatic configuration；performance tuning；big data operation set

成为VIP会员查看完整内容