Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices. Existing approaches for reducing power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not take into account the precise power consumed by each module in the network, and are therefore not optimal. In this paper we develop accurate power consumption models for all arithmetic operations in the DNN, under various working conditions. We reveal several important factors that have been overlooked to date. Based on our analysis, we present PANN (power-aware neural network), a simple approach for approximating any full-precision network by a low-power fixed-precision variant. Our method can be applied to a pre-trained network, and can also be used during training to achieve improved performance. In contrast to previous methods, PANN incurs only a minor degradation in accuracy w.r.t. the full-precision version of the network, even when working at the power-budget of a 2-bit quantized variant. In addition, our scheme enables to seamlessly traverse the power-accuracy trade-off at deployment time, which is a major advantage over existing quantization methods that are constrained to specific bit widths.
翻译:电能消耗是部署深层神经网络(DNN)在终端装置上的一个主要障碍。现有的减少电能消耗的方法依赖于相当一般性的原则,包括避免倍增操作和对重量和启动量进行激烈的量化。然而,这些方法没有考虑到网络中每个模块所消耗的确切电力,因此不是最佳的。在本文件中,我们为DNN所有计算作业开发了准确的电能消耗模型,在各种工作条件下,我们揭示了迄今为止被忽视的若干重要因素。根据我们的分析,我们提出了PANN(PANN)(PAP-aware神经网络),这是以低功率固定精度变异器取代任何全面精确网络的简单方法。我们的方法可以适用于预先培训的网络,也可以在培训中用来提高性能。与以往的方法相比,PANNNP仅使网络的全精度出现轻微的退化。我们提出的全精度版本,即使是在2位四分位变量的电压中工作,我们提出的一个简单方法,即以低功率固定速度变换,此外,我们的主要办法也能够无缝地压地压地控制特定的贸易的四等优势。