Rapid CMOS device size reduction resulted in billions of transistors on a chip have led to integration of many cores leading to many challenges such as increased power dissipation, thermal dissipation, occurrence of transient faults and permanent faults. The mitigation of transient faults and permanent faults at the core level has become an important design parameter in a multi-core scenario. Core level techniques is a redundancy-based fault mitigation technique that improves the lifetime reliability of multi-core systems. In an asymmetric multi-core system, the smaller cores provide fault tolerance to larger cores is a core level fault mitigation technique that has gained momentum and focus from many researchers. The paper presents an economical, asymmetric multi-core system with one instruction cores (MCSOIC). The term Hardware Cost Estimation signifies power and area estimation for MCS-OIC. In MCSOIC, OIC is a warm standby redundant core. OICs provide functional support to conventional cores for shorter periods of time. To evaluate the idea, different configurations of MCSOIC is synthesized using FPGA and ASIC. The maximum power overhead and maximum area overhead are 0.46% and 11.4% respectively. The behavior of OICs in MCS-OIC is modelled using a One-Shot System (OSS) model for reliability analysis. The model parameters namely, readiness, wakeup probability and start-up-strategy for OSS are mapped to the multi-core systems with OICs. Expressions for system reliability is derived. System reliability is estimated for special cases.
翻译:CMOS器件尺寸迅速缩小致使芯片上的晶体管数目达到数十亿,从而导致了多核集成,带来了许多挑战,例如功耗增加、热耗散增加以及暂态故障和永久性故障的出现。在多核场景下,核级别的暂态故障和永久性故障的缓解已成为重要的设计参数。一项基于冗余的核级别缺陷缓解技术是提高多核系统寿命可靠性的方法。在不对称多核系统中,小核提供大核的容错能力是一种核级别的故障缓解技术,受到了许多研究者的关注。本文提出了一种经济的、具有一指令核的不对称多核系统(MCSOIC)。硬件成本估算是对MCS-OIC的功率和面积估算的表述。在MCSOIC中,OIC是一个热备份冗余核心。OIC短时间内为传统核心提供功能支持。为了评估这个想法,使用FPGA和ASIC合成了不同的MCSOIC配置。最大功耗开销和最大面积开销分别为0.46%和11.4%。在MCS-OIC中,OICs的行为被建模为一次性系统(OSS)模型,用于可靠性分析。OSS的准备就绪性、唤醒概率和启动策略等模型参数被映射到带有OICs的多核系统中。推导出了系统可靠性的表达式。针对特殊情况估计系统可靠性。