With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, and malicious attackers. Thus the safety risk of deploying NNs is now drawing much attention. In this paper, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-20). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.
翻译:随着嵌入式深层计算机系统的快速演化,深学习驱动的应用程序正在从云层向边缘移动。当在复杂环境中将神经网络(NNSs)部署到设备时,可能会出现各种类型的故障:宇宙辐射和放射性杂质造成的软错误、电压不稳定、电压不稳定、衰变、温度变化和恶意攻击者。因此,部署NNS的安全风险正在引起人们的极大注意。在本文中,在对各类NNT加速器的可能故障进行分析之后,我们从算法角度正式确定并实施了各种故障模型。我们提议在安装神经网络(FT-NAS)时,可以自动发现对当今设备的各种故障都可靠的卷动神经网络结构。然后,我们在搜索过程中引入了容错训练(FTT-NAS模型 10) 以FFFFF10模型的形式, 实验显示所发现的结构大大超越了我们所设计的其他基线结构,在可比较或更少的浮动式精度网络结构下实现了60-FTF2的精确度运行(FO-NTF) 的精确性模型和精确度模型的精确度,在FO-FTFlickration1的模型下实现了60-F-ration的精确度结构。