Compression of deep neural networks has become a necessary stage for optimizing model inference on resource-constrained hardware. This paper presents FITCompress, a method for unifying layer-wise mixed precision quantization and pruning under a single heuristic, as an alternative to neural architecture search and Bayesian-based techniques. FITCompress combines the Fisher Information Metric, and path planning through compression space, to pick optimal configurations given size and operation constraints with single-shot fine-tuning. Experiments on ImageNet validate the method and show that our approach yields a better trade-off between accuracy and efficiency when compared to the baselines. Besides computer vision benchmarks, we experiment with the BERT model on a language understanding task, paving the way towards its optimal compression.
翻译:深度神经网络的压缩已成为优化资源限制硬件模型推断的一个必要阶段。本文件介绍了FITCompress, 这是一种在单一湿度下统一地层和混合精度的混合精度量化和计算的方法,作为神经结构搜索和巴伊西亚技术的替代办法。 FITCompress将渔业信息计量和通过压缩空间进行路径规划结合起来,以单发微调的方式选择因大小和操作限制而形成的最佳配置。图像网络实验验证了该方法,并表明我们的方法与基线相比,在准确性和效率之间实现更好的权衡。除了计算机的愿景基准外,我们还试验BERT模型的语文理解性任务,为其优化压缩铺平道路。