项目名称: 基于容错代价的云计算可生存性理论与关键技术研究
项目编号: No.61272072
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 邹德清
作者单位: 华中科技大学
项目金额: 80万元
中文摘要: 云计算在为租户提供方便易用和低成本服务的同时,其多租户、资源高度集中特性使得系统一旦出现故障所造成的损失远比传统计算模式严重。云计算的可生存性体现在两个方面:1)对租户而言,需要提供一种满足其服务质量需求,且付出代价小的容错方案,并体现租户分等级特征,这也是以"服务质量"为本的云计算追求的目标,是云计算得以生存和发展的一大关键;2)对云平台而言,需要消除动态复杂环境下故障的关联性以隔离故障和缩小其危害范围,并尽可能减小容错开销,实现故障检测和恢复的自动化,体现出平台的自修复能力。拟从三个层面开展研究:1)故障复杂性分析,以组件为粒度开展关联性分析,并以此研究故障传播问题,2)可生存性理论研究,分别从平台和租户的角度研究平台的可生存性建模以及面向租户等级的容错代价理论;3)可生存性关键技术研究,包括多层次、低代价容错架构以及故障自动识别和恢复方法。项目成果将用于指导高可靠云计算平台的设计。
中文关键词: 云计算;可生存性;故障检测;故障恢复;服务容错
英文摘要: As cloud computing brings ease of use and cost-saving services, its multi-tenancy and highly centralized resource features lead to that system failure causes more serious losses than traditional computing model. The survivability of Cloud computing can be classified into two aspects: 1) for cloud tenants, they need a fault-tolerant solution which meets quality of service requests and minimizes cost. And the solution should reflect the multi-level tenant features. This is the quality of service-oriented pursuit of cloud computing and a key for cloud computing to survive and development; 2) For cloud platforms, they need to eliminate the correlation of failures in this dynamic and complex environment. Also they should isolate faults and narrow the scope of its influence, minimize fault-tolerant cost, and automate the fault detection and recovery, reflecting the platforms' self-healing ability. Our proposal will research in the following three levels: 1) Analysis of the faults complexity, which researches the fault propagation based on faults association analysis in the granularity of platform components; 2) the survivability theory, which researches the platform survivability modeling from the platform and the tenants' point of view respectively and the tenant level-oriented fault-tolerant cost theory; 3) key tech
英文关键词: Cloud Computing;Survivability;Fault Detection;Fault Recovery;Service Fault-Tolerance