Vision Transformers (ViTs) that leverage self-attention mechanism have shown superior performance on many classical vision tasks compared to convolutional neural networks (CNNs) and gain increasing popularity recently. Existing ViTs works mainly optimize performance and accuracy, but ViTs reliability issues induced by hardware faults in large-scale VLSI designs have generally been overlooked. In this work, we mainly study the reliability of ViTs and investigate the vulnerability from different architecture granularities ranging from models, layers, modules, and patches for the first time. The investigation reveals that ViTs with the self-attention mechanism are generally more resilient on linear computing including general matrix-matrix multiplication (GEMM) and full connection (FC), and show a relatively even vulnerability distribution across the patches. However, ViTs involve more fragile non-linear computing such as softmax and GELU compared to typical CNNs. With the above observations, we propose an adaptive algorithm-based fault tolerance algorithm (ABFT) to protect the linear computing implemented with distinct sizes of GEMM and apply a range-based protection scheme to mitigate soft errors in non-linear computing. According to our experiments, the proposed fault-tolerant approaches enhance ViT accuracy significantly with minor computing overhead in presence of various soft errors.
翻译:在这项工作中,我们主要研究VLSI的可靠性,并首次调查模型、层、模块和补丁等不同结构颗粒的脆弱性。调查显示,与进化神经网络相比,与进化神经网络相比,现有VIT在许多古典视觉任务上表现优异,最近越来越受欢迎。现有的VIT主要优化性能和准确性,但大规模VLSI设计硬件缺陷引发的VIT可靠性问题通常被忽视。在这项工作中,我们主要研究VIT的可靠性,并首次调查模型、层、模块和补丁等不同结构颗粒的脆弱性。调查显示,与自备机制相比,VIT在线性计算(包括通用矩阵矩阵倍增)和完全连接(FC)方面一般具有更强的弹性,并显示出相对均衡的跨补差分布。然而,VIT涉及较脆弱的非线性计算,如软式成和GELU等。我们提出了基于适应性算法的错误容忍算法算法(ABFT),以保护以不同尺寸实施的直线性计算,并应用基于范围的保护计划来减轻非线性计算方法中的软性误差,从而大幅改进了各种软式计算。