With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently and do not fully consider compiler-level optimizations which is a must-do for mobile acceleration. In this work, we first propose (i) a general category of fine-grained structured pruning applicable to various DNN layers, and (ii) a comprehensive, compiler automatic code generation framework supporting different DNNs and different pruning schemes, which bridge the gap of model compression and NAS. We further propose NPAS, a compiler-aware unified network pruning, and architecture search. To deal with large search space, we propose a meta-modeling procedure based on reinforcement learning with fast evaluation and Bayesian optimization, ensuring the total number of training epochs comparable with representative NAS frameworks. Our framework achieves 6.7ms, 5.9ms, 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3 level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an off-the-shelf mobile phone, consistently outperforming prior work.
翻译:随着对在移动边缘装置上高效部署 DNN的要求日益增加,减少不必要计算和提高执行速度就变得更加重要了。实现这一目标的先前方法,包括模型压缩和网络架构搜索(NAS),大多是独立进行的,没有充分考虑到为移动加速而必须完成的汇编者一级的优化。在这项工作中,我们首先提出(一) 适用于DNN各层次的精细结构调整总体类别,以及(二) 支持不同DNNN和不同调整计划的综合、编译自动代码生成框架,以弥补模型压缩和NAS的缺口。我们进一步提议了NAPS,一个汇编者-认识的统一网络运行和架构搜索。为了处理大型搜索空间,我们提议了一个基于快速评估和Bayesian优化强化学习的元模型程序,确保与具有代表性的NAS框架相比,培训的总数。我们的框架实现了6.7ms, 5.9ms,3.9ms 图像网络的计算周期为78.2%,75%(MobileNet-V3),在前一级稳定地显示前一级,71%。