Transistor aging is one of the major concerns that challenges designers in advanced technologies. It profoundly degrades the reliability of circuits during its lifetime as it slows down transistors resulting in errors due to timing violations unless large guardbands are included, which leads to considerable performance losses. When it comes to Neural Processing Units (NPUs), where increasing the inference speed is the primary goal, such performance losses cannot be tolerated. In this work, we are the first to propose a reliability-aware quantization to eliminate aging effects in NPUs while completely removing guardbands. Our technique delivers a graceful inference accuracy degradation over time while compensating for the aging-induced delay increase of the NPU. Our evaluation, over ten state-of-the-art neural network architectures trained on the ImageNet dataset, demonstrates that for an entire lifetime of 10 years, the average accuracy loss is merely 3%. In the meantime, our technique achieves 23% higher performance due to the elimination of the aging guardband.
翻译:晶体管老化是质疑先进技术设计者的主要关切之一。 它使电路在其寿命期内的可靠性大大降低,因为它慢下来了晶体管的可靠性,导致时间违规导致错误,除非包括大型防护带,从而导致相当大的性能损失。 当涉及到神经处理器(NPUs)时,提高推断速度是首要目标,这种性能损失是无法容忍的。 在这项工作中,我们首先提出一个可靠觉醒的四分法,以消除NPUs的老化效应,同时完全去除防护带。 我们的技术在时间上提供了优雅的推断准确性退化,同时补偿NPU的老化引起的延迟增加。 我们的评估,在图像网络数据集培训的十多座最先进的神经网络结构显示,在整个10年中,平均精确损失仅为3%。 与此同时,我们的技术由于消除了老化的防护带而提高了23%的性能。