项目名称: 面向芯片级的多核处理器故障恢复方法研究
项目编号: No.61472100
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 季振洲
作者单位: 哈尔滨工业大学
项目金额: 83万元
中文摘要: 多核处理器芯片所面临的瞬时故障和局部永久故障,对高可靠多核处理器芯片的设计提出了更高的要求。软件层或系统层的故障恢复无法同时保证故障恢复的透明性、确定性、高可用性。基于硬件方式故障恢复有更多的优越性,未来多核处理器芯片将具有更高的集成度和可扩展性,这为实现芯片级的故障恢复提供了可能。本课题拟从芯片级故障恢复出发,为多核处理器提供高可靠的故障恢复方法和模型。研究基于芯片级硬件检查点机制的多核处理器卷回恢复方法,实现瞬时故障恢复的透明性、通用性和高可用性;在此基础上,提出一种新的分离式日志记录机制,保证瞬时故障恢复的确定性;研究区域约束下的硬件演化机制,实现低代价、细粒度的多核处理器局部永久故障恢复;通过分析多种故障下多核处理器的执行模式,研究多模式故障恢复的多核处理器芯片模型,保证多核处理器对故障恢复的自适应性。本项目的研究将为未来高可靠多核处理器芯片的设计提供重要理论基础和技术支撑。
中文关键词: 计算机系统结构;多核处理器;可重构计算;片上网络
英文摘要: With the challenges of transient faults and partial permanent faults, critical demands are put forward to the design of high reliable multi-core processors. Failure recovery solutions in software or system level cannot simultaneously guarantee transparency, deterministic and high availability. While hardware-based solutions have more advantages, future multi-core processors will have higher degree of chip integration and better scalability, and this provides the possibilities of chip-level solutions for failure recovery. This research will explore high reliable failure recovery methods and models in chip level for multi-core processors. Chip level methods of hardware-based checkpoint/restart for multi-core processors will be studied to obtain transparency, versatility and high availability for transient faults recovery. Then we will study a new separate memory race recording mechanism to guarantee the deterministic execution of the transient faults recovery process. In order to achieve low-cost, fine-grained partial permanent recovery for multi-core processors, we will study area-constrained evolvable hardware algorithms. By analyzing the execution mode of multi-core processors under various fault conditions, we will study multi-mode models of failure recovery to guarantee an adaptive recovery process for multi-core processors. Our research will provide important theoretical bases and technical supports for the design of future high reliable multi-core processors.
英文关键词: Computer Architecture;Multicore Processor;Reconfigurable Computing;Network-on-Chip