项目名称: 精确快速的结构级软错误量化关键技术研究
项目编号: No.61472244
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 付宇卓
作者单位: 上海交通大学
项目金额: 80万元
中文摘要: 随着集成电路工艺尺寸的不断缩小,更高的软错误率和复杂的多位翻转模式对芯片设计带来的挑战日益严峻。目前针对处理器结构的软错误量化方法存在评估过程耗时过长和量化结果精度过低的问题,难以为高可靠微处理器提供精确快速的容错设计指导。针对这些问题,本课题提出了一套面向微处理器架构的软错误量化评估方案,通过程序指令分析获得错误逻辑屏蔽关系,以此建立关键存储结构的敏感状态概率转换图,并利用概率图模型实现特定部件的软错误量化指标的精确快速计算。与现有类似方法相比,本课题研究的优势体现在:1)全面分析了软错误在存储部件中的传播效应和屏蔽效应,结合涵盖一位翻转和多位翻转错误模型的统一形式化描述方式,使得评估结果的精度大幅提高;2)将概率图建模方法引入软错误量化分析中,结合应用场景分析制定了合理的模型化简机制,有效加速了大规模变量情况下的边缘概率求解,从而保证了关键存储部件的精确软错误量化值的快速获取。
中文关键词: 软错误;多位翻转;概率图模型;体系架构敏感因子
英文摘要: With the IC process continue to shrink, the higher soft error rate and complex pattern of multiple cell upsets (MCU) brings increasingly challenge to system-on-chip design. Existing methods to quantify soft error in processor architecture, like Fault Injection (FI) and Architecturally Correct Execution (ACE), are incapable to provide fast and accurate guidelines for fault-tolerant microprocessor design for their long evaluating time or low result accuracy. To solve these problems, this proposal presents a novel architecture-level analysis frame to calculate architectural vulnerability factor (AVF) of soft error in microprocessor. By modeling logic masking effect of soft error in memory based components by program instruction profiling and establishing vulnerable ACE state transition diagram with probabilistic graphical model (PGM), our method can provide precise and rapid quantitative soft errors evaluation for specific memory components in microarchitecture. Compared with the existing similar methods , the advantages of this research is reflected in: 1 ) a comprehensive analysis of soft error propagation effects and masking effects in the storage unit combined with unified formal description of both Single Bit Upset (SBU) and MCU fault models to provide a substantial increase the accuracy of the AVF results ; 2 ) the introduction of probabilistic graphical modeling approach to solve numerous ACE state relation processing, combined with specific scenario based analysis to develop a reasonable model simplification mechanisms to effectively accelerate the large-scale PGM inference, which is a good guarantee of quick and accurate AVF results in the key memory components such as register files and L1 caches.
英文关键词: Soft Error;Multi-Cell Upsets(MCU);Probabilistic Graphical Model (PGM);Architectural Vulnerability Factor (AVF)