项目名称: 备份系统中基于语义挖掘的多层次冗余消除关键技术研究
项目编号: No.61502190
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 自动化技术、计算机技术
项目作者: 夏文
作者单位: 华中科技大学
项目金额: 22万元
中文摘要: 随着备份存储系统的数据规模持续增长,多层次冗余消除作为一种融合了重复数据删除、差量压缩和传统压缩的技术,能够分别从重复数据块、相似数据块、重复字符串等多个层次来最大化地检测和消除大规模备份系统中的冗余数据,从而获得越来越多的关注。针对多层次冗余消除带来的索引开销、计算开销、数据碎片等问题与挑战,项目提出了分析和挖掘多层次冗余数据分布与备份数据的用户、版本、文件属性、局部性等语义关联的方法,并据此研究基于备份数据语义感知的重复数据和相似数据的索引组织模式及检测机制,来减少多层次冗余消除的索引开销;研究基于多层次冗余消除计算模型学习的并行计算策略,和基于冗余负载预测的任务调度机制,来加快多层次冗余消除的计算过程;研究基于备份数据语义挖掘的碎片消除和恢复缓存替换算法,来提升冗余消除后的恢复性能。项目将为面向数据备份的多层次冗余消除研究提供新的方法和途径,并推进多层次冗余消除技术的更广泛应用。
中文关键词: 重复数据删除;备份存储系统;差量压缩;恢复性能;冗余数据消除
英文摘要: With the growing amount of data in backup storage system, multi-level redundancy elimination that combines data deduplication, delta compression, and traditional data deduplication techniques, is able to maximally identify and eliminate redundant data at the levels of duplicate chunks, similar chunks, and duplicate strings respectively, and thus is gaining increasing attention. To address the new challenges of indexing & computing overheads and fragmentation issues stem from multi-level redundancy elimination, we propose approaches to explore the relationships between the redundant data and backup data semantics, such as users, versions, file attributes, data locality, etc. Then we propose a backup-data-semantics-aware indexing scheme for multi-level redundancy elimination to reduce the overheads for indexing the similar and duplicate chunks. Next, we propose a computational model of multi-level redundancy elimination to design the parallel computing scheme and thus reduce time overhead for redundancy elimination, and further study the redundancy workloads to better schedule the parallelizing tasks. Finally, we suggest exploiting backup data semantics to design a fragmentation elimination scheme and a restore cache replacement policy for better restore performance after multi-level redundancy elimination. The launching of this proposal can provide new methods for improving multi-level redundancy elimination techniques in backup storage systems and thus promote the use of multi-level redundancy elimination in other areas.
英文关键词: Data Deduplication;Backup Storage Systems;Delta Compression;Restore Performance;Redundancy Elimination