Approximate computing is known for enhancing deep neural network accelerators' energy efficiency by introducing inexactness with a tolerable accuracy loss. However, small accuracy variations may increase the sensitivity of these accelerators towards undesired subtle disturbances, such as permanent faults. The impact of permanent faults in accurate deep neural network (AccDNN) accelerators has been thoroughly investigated in the literature. Conversely, the impact of permanent faults and their mitigation in approximate DNN (AxDNN) accelerators is vastly under-explored. Towards this, we first present an extensive fault resilience analysis of approximate multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) using the state-of-the-art Evoapprox8b multipliers in GPU and TPU accelerators. Then, we propose a novel fault mitigation method, i.e., fault-aware retuning of weights (Fal-reTune). Fal-reTune retunes the weights using a weight mapping function in the presence of faults for improved classification accuracy. To evaluate the fault resilience and the effectiveness of our proposed mitigation method, we used the most widely used MNIST, Fashion-MNIST, and CIFAR10 datasets. Our results demonstrate that the permanent faults exacerbate the accuracy loss in AxDNNs compared to the AccDNN accelerators. For instance, a permanent fault in AxDNNs can lead to 56\% accuracy loss, whereas the same faulty bit can lead to only 4\% accuracy loss in AccDNN accelerators. We empirically show that our proposed Fal-reTune mitigation method improves the performance of AxDNNs up to 98%, even with fault rates of up to 50%. Furthermore, we observe that the fault resilience in AxDNNs is orthogonal to their energy efficiency.
翻译:使用精确的深神经网络(ACCDNNN)加速器的永久断层影响。相反,永久断层的影响及其在近DNNN(AxDNNN)加速器(AxDNNNN)加速器中的缓解作用远远没有得到充分利用。为此,我们首先对近似多层透视(MLPs)和飞动神经网络(CNNS)等不理想的微妙扰动的精确度进行了广泛的误差率分析。在GPU和TPU加速器中,对精确度永久断层的影响进行了彻底调查。然后,我们提出了一种新型的错误缓解方法,即,对重力(AxDNNNN(AxDNNN)的错误感知觉再调(FRetune)。为此,我们首先对多层透析器(MNLD)的准确度进行了广泛的误差率分析,而我们使用的方法能显示我们长期数据解算法中的惯性效率。