Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads by deploying different sets of bit-width precisions on separate compute operations. Recent research has shown significant progress in applying mixed-precision quantization techniques to reduce the memory footprint of various workloads, while also preserving task performance. Prior work, however, has often ignored additional objectives, such as bit-operations, that are important for deployment of workloads on hardware. Here we present a flexible and scalable framework for automated mixed-precision quantization that optimizes multiple objectives. Our framework relies on Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search method, to find Pareto optimal mixed-precision configurations for memory and bit-operations objectives. Within NEMO, a population is divided into structurally distinct sub-populations (species) which jointly form the Pareto frontier of solutions for the multi-objective problem. At each generation, species are re-sized in proportion to the goodness of their contribution to the Pareto frontier. This allows NEMO to leverage established search techniques and neuroevolution methods to continually improve the goodness of the Pareto frontier. In our experiments we apply a graph-based representation to describe the underlying workload, enabling us to deploy graph neural networks trained by NEMO to find Pareto optimal configurations for various workloads trained on ImageNet. Compared to the state-of-the-art, we achieve competitive results on memory compression and superior results for compute compression for MobileNet-V2, ResNet50 and ResNeXt-101-32x8d. A deeper analysis of the results obtained by NEMO also shows that both the graph representation and the species-based approach are critical in finding effective configurations for all workloads.
翻译:混合精密量度是一个强大的工具,通过在不同的计算操作中部署不同组合比特精度精确度,使记忆和计算神经网络工作量的节省能够计算和计算。最近的研究显示,在应用混合精度量度技术以减少各种工作量的记忆足迹,同时保持任务性能方面,取得了显著进展。然而,先前的工作往往忽视了其他目标,例如比特操作,这对部署硬件工作量十分重要。在这里,我们为自动混合网络量化提供了一个灵活和可缩放的框架,优化了多个目标。我们的框架依赖于神经革命-增强的图像精度精确度多目的优化(NEEMO),这是一种新颖的搜索方法,旨在找到Pareto 最佳混合精度配置配置,以达到记忆和点操作目标。在NEMO中,人口被分成为结构上独特的子群群群群群群(Special),这构成了多目标状态的解决方案的最佳前沿。在每一代中,物种被调整成比例,在为我们精准的直径直径直径直径直径网络上, 将精度直径直径直径直径直径直径直径直径直径直径直径直径直径分析。