The increasing computational cost of deep neural network models limits the applicability of intelligent applications on resource-constrained edge devices. While a number of neural network pruning methods have been proposed to compress the models, prevailing approaches focus only on parametric operators (e.g., convolution), which may miss optimization opportunities. In this paper, we present a novel fusion-catalyzed pruning approach, called FuPruner, which simultaneously optimizes the parametric and non-parametric operators for accelerating neural networks. We introduce an aggressive fusion method to equivalently transform a model, which extends the optimization space of pruning and enables non-parametric operators to be pruned in a similar manner as parametric operators, and a dynamic filter pruning method is applied to decrease the computational cost of models while retaining the accuracy requirement. Moreover, FuPruner provides configurable optimization options for controlling fusion and pruning, allowing much more flexible performance-accuracy trade-offs to be made. Evaluation with state-of-the-art residual neural networks on five representative intelligent edge platforms, Jetson TX2, Jetson Nano, Edge TPU, NCS, and NCS2, demonstrates the effectiveness of our approach, which can accelerate the inference of models on CIFAR-10 and ImageNet datasets.
翻译:深神经网络模型的计算成本不断增加,限制了智能应用对资源限制边缘装置的适用性。虽然已经提议了一些神经网络修剪方法以压缩模型,但普遍的做法只侧重于可能错过优化机会的参数操作员(例如,变速),在本文中,我们提出了一种新型的混合催化裁剪方法,称为FuPruner,它同时优化了对准和非参数操作员,以加速神经网络。我们引入了一种进取性融合方法,以同等地改造一个模型,该模型将优化裁剪空间扩大,使非参数操作员能够以与参数操作员相似的方式进行修剪剪裁,并采用动态过滤切割方法降低模型的计算成本,同时保留准确性要求。此外,FumPruner为控制混凝和修剪裁提供了可配置的优化选项,从而可以灵活得多地实现性-准确性交换。我们五部具有代表性的智能边缘平台上的状态残余神经网络网络网络网络,Jegson TXS 和CANSS 的加速型模型, AS-10 的NCS AS-NCS 和C-NBS AS-NC-NC-NPU 的加速式模型, AS-NC-NC-NC-NC-NC-NC-NC-S 和C-S 和C-S AS-S AS-NC-S 和C-S-S-S-C-C-C-S-S-C-C-C-C-S-C-C-C-C-C-C-C-C-C-C-C-S-S-S-S-S-S-C-C-C-S-S-S-S-S-S-S-C-S-S-S-S-S-S-S-S-S-S-S-S-S-S-C-C-C-C-S-S-S-S-S-S-S-S-S-S-S-C-C-S-S-C-C-S-S-S-S-S-S-S-S-S-C-S-S-S-S-S-S-S-S-S-S-S