For practical deep neural network design on mobile devices, it is essential to consider the constraints incurred by the computational resources and the inference latency in various applications. Among deep network acceleration related approaches, pruning is a widely adopted practice to balance the computational resource consumption and the accuracy, where unimportant connections can be removed either channel-wisely or randomly with a minimal impact on model accuracy. The channel pruning instantly results in a significant latency reduction, while the random weight pruning is more flexible to balance the latency and accuracy. In this paper, we present a unified framework with Joint Channel pruning and Weight pruning (JCW), and achieves a better Pareto-frontier between the latency and accuracy than previous model compression approaches. To fully optimize the trade-off between the latency and accuracy, we develop a tailored multi-objective evolutionary algorithm in the JCW framework, which enables one single search to obtain the optimal candidate architectures for various deployment requirements. Extensive experiments demonstrate that the JCW achieves a better trade-off between the latency and accuracy against various state-of-the-art pruning methods on the ImageNet classification dataset. Our codes are available at https://github.com/jcw-anonymous/JCW.
翻译:对于移动设备的实际深心神经网络设计而言,必须考虑计算资源和各种应用中的推推力拉长所产生的限制。在深度网络加速相关方法中,裁剪是一种广泛采用的做法,以平衡计算资源消耗和准确性,这种不重要的连接可以通过对模型精确度影响最小的方式以渠道方式或随机方式完全消除,对模型精确度的影响最小。频道的运行立即导致显著的潜伏减少,而随机重量调整则更灵活,以平衡延迟度和准确性。在本文中,我们提出了一个与联合频道调整和微弱调整(JCW)一起的统一框架,并比以往的模型压缩方法在延缓度和准确性之间实现更好的平衡。为了充分优化宽度和准确性之间的权衡,我们在JCW框架中制定了一个定制的多目标演算法,使一次搜索就能为各种部署要求获得最佳的候选结构。广泛的实验表明,JCWC在各种州/网络的悬浮度和精确度之间实现了更好的交易。在各种州/州/州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州