On-chip memory (usually based on Static RAMs-SRAMs) are crucial components for various computing devices including heterogeneous devices, e.g., GPUs, FPGAs, ASICs to achieve high performance. Modern workloads such as Deep Neural Networks (DNNs) running on these heterogeneous fabrics are highly dependent on the on-chip memory architecture for efficient acceleration. Hence, improving the energy-efficiency of such memories directly leads to an efficient system. One of the common methods to save energy is undervolting i.e., supply voltage underscaling below the nominal level. Such systems can be safely undervolted without incurring faults down to a certain voltage limit. This safe range is also called voltage guardband. However, reducing voltage below the guardband level without decreasing frequency causes timing-based faults. In this paper, we propose MoRS, a framework that generates the first approximate undervolting fault model using real faults extracted from experimental undervolting studies on SRAMs to build the model. We inject the faults generated by MoRS into the on-chip memory of the DNN accelerator to evaluate the resilience of the system under the test. MoRS has the advantage of simplicity without any need for high-time overhead experiments while being accurate enough in comparison to a fully randomly-generated fault injection approach. We evaluate our experiment in popular DNN workloads by mapping weights to SRAMs and measure the accuracy difference between the output of the MoRS and the real data. Our results show that the maximum difference between real fault data and the output fault model of MoRS is 6.21%, whereas the maximum difference between real data and random fault injection model is 23.2%. In terms of average proximity to the real data, the output of MoRS outperforms the random fault injection approach by 3.21x.
翻译:芯片内存(通常基于静态 RAMS-SRAMs) 是各种计算设备的关键组成部分, 包括混杂设备( 如 GPUs、 FPGAs、 ASIC ), 包括 GPUs 、 FPGAs、 ASIC 等, 实现高性能。 在这些混杂结构上运行的深神经网络( DNNs ) 等现代工作量高度依赖于对芯片内存结构结构的高效加速。 因此, 提高这种记忆的节能效率直接导致一个高效的系统。 节省能源的常见方法之一正在变化中出现差异, 即: 在标定的重量之下, 供应压低的电压。 这种系统可以安全地淡化准确度, 而不会导致故障, 调低的电动保护带网网络内网络内网络内网络内网络内网络内网络内网络内的数据输出值 。 在IMIS 的内, 测试中, 最高级数据机能性数据输出中, 需要由IMSNRIS 的高级数据测试中, 的内流流流流数据流中, 真正的数据流流流数据流数据流中, 我们的断断到最高级数据测试中, 需要由IMIS 测试中的任何数据流数据流数据流数据流中, 测试中的任何数据流数据流数据流数据流中, 任何数据流数据流数据流数据流数据流中, 的内数据流中, 的内数据流中, 需要充分测试中的任何数据流数据流中的任何数据流数据流数据流数据流。