对多核心RISC-V平台上中子引起的错误的实验评价 (Experimental evaluation of neutron-induced errors on a multicore RISC-V platform)

RISC-V architectures have gained importance in the last years due to their flexibility and open-source Instruction Set Architecture (ISA), allowing developers to efficiently adopt RISC-V processors in several domains with a reduced cost. For application domains, such as safety-critical and mission-critical, the execution must be reliable as a fault can compromise the system's ability to operate correctly. However, the application's error rate on RISC-V processors is not significantly evaluated, as it has been done for standard x86 processors. In this work, we investigate the error rate of a commercial RISC-V ASIC platform, the GAP8, exposed to a neutron beam. We show that for computing-intensive applications, such as classification Convolutional Neural Networks (CNN), the error rate can be 3.2x higher than the average error rate. Additionally, we find that the majority (96.12%) of the errors on the CNN do not generate misclassifications. Finally, we also evaluate the events that cause application interruption on GAP8 and show that the major source of incorrect interruptions is application hangs (i.g., due to an infinite loop or a racing condition).

翻译：在过去几年里,RISC-V处理器的错误率因其灵活性和开放源码指令设置架构(ISA)而变得日益重要,使开发者能够以降低成本的方式在多个领域高效采用RISC-V处理器。对于安全关键和任务关键等应用领域,执行必须可靠,因为故障会损害系统正确运行的能力。然而,RISC-V处理器的错误率没有像标准程序x86那样得到重大评价。在这项工作中,我们调查了商业的RISC-V ACIC ACIC平台(GAP8)的错误率,该平台暴露在中子波束中。我们显示,对于计算机密集型应用,如Colultual Neural网络(CNN),错误率可能比平均错误率高3.2x。此外,我们发现CNN错误的多数(96.12%)并没有产生错误分类。最后,我们还评估了造成GAP8应用中断的事件,并表明错误中断的主要源是应用悬浮(i.g.),因为是一个不动的状态。