Deep Learning Accelerators are prone to faults which manifest in the form of errors in Neural Networks. Fault Tolerance in Neural Networks is crucial in real-time safety critical applications requiring computation for long durations. Neural Networks with high regularisation exhibit superior fault tolerance, however, at the cost of classification accuracy. In the view of difference in functionality, a Neural Network is modelled as two separate networks, i.e, the Feature Extractor with unsupervised learning objective and the Classifier with a supervised learning objective. Traditional approaches of training the entire network using a single supervised learning objective is insufficient to achieve the objectives of the individual components optimally. In this work, a novel multi-criteria objective function, combining unsupervised training of the Feature Extractor followed by supervised tuning with Classifier Network is proposed. The unsupervised training solves two games simultaneously in the presence of adversary neural networks with conflicting objectives to the Feature Extractor. The first game minimises the loss in reconstructing the input image for indistinguishability given the features from the Extractor, in the presence of a generative decoder. The second game solves a minimax constraint optimisation for distributional smoothening of feature space to match a prior distribution, in the presence of a Discriminator network. The resultant strongly regularised Feature Extractor is combined with the Classifier Network for supervised fine-tuning. The proposed Adversarial Fault Tolerant Neural Network Training is scalable to large networks and is independent of the architecture. The evaluation on benchmarking datasets: FashionMNIST and CIFAR10, indicates that the resultant networks have high accuracy with superior tolerance to stuck at "0" faults compared to widely used regularisers.
翻译:深度学习加速器容易出现错误, 表现为神经网络错误。 神经网络的失灵容忍度对于需要长期计算、 实时安全关键应用程序至关重要。 高常规化的神经网络以分类准确性为代价, 显示高错容度。 鉴于功能上的差异, 神经网络的模式是两个不同的网络, 即具有不受监督的学习目标的功能提取器, 以及具有监管学习目标的分类器。 使用单一监管的学习目标来培训整个网络的传统方法不足以最优化地实现各个组件的目标。 在这项工作中, 一个创新的多标准目标功能, 结合不受监督的功能提取器培训, 并随后与分类网络的监管调整。 未经监督的培训在两个不同的网络中同时进行, 即: 具有与特性提取器相冲突的目标的精度提取器。 第一次游戏在重建用于配置精度精度精度精度精度化10 网络的配置中, 将一个新的多标准目标功能化目标功能功能功能功能功能功能结合, 将一个常规的功能提取器显示 。 机尾部显示, 机尾部的机的机尾部的机型网络在稳定化前的配置分布中, 显示, 机尾部的机型的机尾部的机能的机能的功能性结构的功能性能能能的功能性能能能能能能能能能的功能性能高。