For effective and efficient deep neural network inference, it is desirable to achieve state-of-the-art accuracy with the simplest networks requiring the least computation, memory, and power. Quantizing networks to lower precision is a powerful technique for simplifying networks. It is generally desirable to quantize as aggressively as possible without incurring significant accuracy degradation. As each layer of a network may have different sensitivity to quantization, mixed precision quantization methods selectively tune the precision of individual layers of a network to achieve a minimum drop in task performance (e.g., accuracy). To estimate the impact of layer precision choice on task performance two methods are introduced: i) Entropy Approximation Guided Layer selection (EAGL) is fast and uses the entropy of the weight distribution, and ii) Accuracy-aware Layer Precision Selection (ALPS) is straightforward and relies on single epoch fine-tuning after layer precision reduction. Using EAGL and ALPS for layer precision selection, full-precision accuracy is recovered with a mix of 4-bit and 2-bit layers for ResNet-50 and ResNet-101 classification networks, demonstrating improved performance across the entire accuracy-throughput frontier, and equivalent performance for the PSPNet segmentation network in our own commensurate comparison over leading mixed precision layer selection techniques, while requiring orders of magnitude less compute time to reach a solution.
翻译:为了进行有效和高效的深神经网络推断,最好实现最简单、最需要最低计算、内存和权力的网络最先进的准确性。将网络量化为低精度是简化网络的有力技术。一般而言,在不引起重大精确性降解的情况下尽可能大力量化是可取的。一个网络的每一层对量化可能具有不同的敏感性,混合精密量化方法有选择地调整一个网络各层的精确度,以达到任务性能的最低下降(例如,准确性)。为了估计层精确度选择对任务性能的影响,采用了两种方法:i) Entropy Approcolation 指导层选择(EAGL)是快速的,并使用重量分布分布的加密;ii) 准确性能-亚精度选择(ALPS)是直径直径的简单化,并依靠一个精细的精度调整,在层精度降低后,使用EAGL和ALPS, 全面精确性精确性能,同时采用4位和2位的混合时间段组合,在ResNet-Net网络的精确度上,以显示自己的准确性能-compal-compal-pal-50和Restal-compal-commal-commal-commissal-commlation Soilation commal commation commation commation commation commation commation commation commation overtra) commation rostration rostration rolation roututal routalation commusal rocil commation commation commation comm commation commation routal commation rocil commal commal roututal routututal commal commal commal robal commation commal commal commal comm,同时显示,以显示我们的整个要求,以显示我们的全比,以显示自己的四比,同时显示整个要求的准确性平段的精确度网络的准确性能-pal-pal-pal-cal-pal-pal-