Low-rank compression is an important model compression strategy for obtaining compact neural network models. In general, because the rank values directly determine the model complexity and model accuracy, proper selection of layer-wise rank is very critical and desired. To date, though many low-rank compression approaches, either selecting the ranks in a manual or automatic way, have been proposed, they suffer from costly manual trials or unsatisfied compression performance. In addition, all of the existing works are not designed in a hardware-aware way, limiting the practical performance of the compressed models on real-world hardware platforms. To address these challenges, in this paper we propose HALOC, a hardware-aware automatic low-rank compression framework. By interpreting automatic rank selection from an architecture search perspective, we develop an end-to-end solution to determine the suitable layer-wise ranks in a differentiable and hardware-aware way. We further propose design principles and mitigation strategy to efficiently explore the rank space and reduce the potential interference problem. Experimental results on different datasets and hardware platforms demonstrate the effectiveness of our proposed approach. On CIFAR-10 dataset, HALOC enables 0.07% and 0.38% accuracy increase over the uncompressed ResNet-20 and VGG-16 models with 72.20% and 86.44% fewer FLOPs, respectively. On ImageNet dataset, HALOC achieves 0.9% higher top-1 accuracy than the original ResNet-18 model with 66.16% fewer FLOPs. HALOC also shows 0.66% higher top-1 accuracy increase than the state-of-the-art automatic low-rank compression solution with fewer computational and memory costs. In addition, HALOC demonstrates the practical speedups on different hardware platforms, verified by the measurement results on desktop GPU, embedded GPU and ASIC accelerator.
翻译:低调压缩是获取压缩神经网络模型的重要模型压缩策略。 一般而言,由于排名值直接决定模型复杂度和模型准确性,因此正确选择分层排序非常关键和理想。 尽管已经提出了许多低级别压缩方法,或者以手工方式或自动方式选择分层,但迄今为止,它们遭受了成本高昂的人工测试或不满意的压缩性能。此外,所有现有工程都不是以硬件认知的方式设计,限制了真实世界硬件平台上压缩的模型的实际性能。为了应对这些挑战,我们在本文件中提议HALOC,一个硬件智能高的自动低级准确度压缩框架。通过从架构搜索角度解释自动排序选择,我们制定了一个端对端解决方案,以不同和硬件水平的压缩性能。我们进一步提出了设计原则和缓解战略,以高效探索级别空间并减少模型上的潜在干扰问题。 不同数据集和硬件平台上的实验结果展示了我们拟议方法的实效。 在 CFAR- 10 10 10 中, HALOC 自动读更准确性地,HAL- 16 和 ROG- 更低的 Rest%, 和0.38 显示成本 。