项目名称: 基于纠删码的大规模存储集群重构优化技术
项目编号: No.61300046
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 万胜刚
作者单位: 华中科技大学
项目金额: 26万元
中文摘要: 大规模存储集群是当前云存储与大数据存储系统的基础平台,所包含的成千上万软件与硬件部件随时都可能失效,从而导致服务中断甚至数据丢失等严重问题。传统基于多副本冗余技术在数据量增大时,其运营成本急剧上升,因此使用存储效率更高的纠删码技术成为必然,但传统纠删码具有较差的重构性能。针对这一问题,观察到集群内冗余存储单元、存储节点和网络上下链路使用情况和失效模式具有较强的非平衡性,因此本课题研究通过调度存储集群内的大量处理、传输与存储资源,通过优先恢复低可靠性条带的数据,适当延迟正常用户请求以加快降级读请求,及利用纠删码编码规则提高条带内重构的并行性,设计加快纠删码存储集群的数据重构过程的方法和机制,从而提高系统整体性能、可用性及可靠性。在保持可靠性的同时,有效降低云存储与大数据存储系统的冗余成本。
中文关键词: 存储集群;可靠性;性能;纠删码;固态盘
英文摘要: As the infrastructure of cloud storage and big data storage systems, storage clusters are widely deployed in data centers. Typically, a storage cluster is composed of thousands of independent nodes and contains a lots of commodity software and hardware components. In such an environment, failures are not rare. Those failures result in service interruption and data loss which may seize up the running of the whole society. Therefore, redundant schemes should be introduced to improve the availability and reliability of storage systems thus reduce the risk brought by failures. As a data redundant technology, erasure codes promise high availability and reliability at low cost. However, an erasure-coded storage cluster suffers from potential problems of performance, availability and reliability, which are incurred by the traditional centralized reconstruction approach. To address these problems, we propose a series of approaches to speed up the reconstruction process by leveraging the abundant process, transmission and storage resources in the storage clusters. Through these approaches, the performance, availability, and reliability of the systems can be improved. As a result, the redundant cost of cloud storage and big data storage can be reduced by deploying those practical erasure-coded storage clusters.
英文关键词: Storage Clusters;Reliability;Performance;Erasure Codes;SSD