Besides accuracy, the model size of convolutional neural networks (CNN) models is another important factor considering limited hardware resources in practical applications. For example, employing deep neural networks on mobile systems requires the design of accurate yet fast CNN for low latency in classification and object detection. To fulfill the need, we aim at obtaining CNN models with both high testing accuracy and small size to address resource constraints in many embedded devices. In particular, this paper focuses on proposing a generic reinforcement learning-based model compression approach in a two-stage compression pipeline: pruning and quantization. The first stage of compression, i.e., pruning, is achieved via exploiting deep reinforcement learning (DRL) to co-learn the accuracy and the FLOPs updated after layer-wise channel pruning and element-wise variational pruning via information dropout. The second stage, i.e., quantization, is achieved via a similar DRL approach but focuses on obtaining the optimal bits representation for individual layers. We further conduct experimental results on CIFAR-10 and ImageNet datasets. For the CIFAR-10 dataset, the proposed method can reduce the size of VGGNet by 9x from 20.04MB to 2.2MB with a slight accuracy increase. For the ImageNet dataset, the proposed method can reduce the size of VGG-16 by 33x from 138MB to 4.14MB with no accuracy loss.
翻译:除准确性外,考虑到实际应用中的硬件资源有限,遗传神经网络模型的模型规模是另一个重要因素,因为实际应用中硬件资源有限。例如,在移动系统中使用深神经网络需要设计精确而快速的CNNCN, 以便进行分类和物体探测方面的低悬浮。为了满足这一需要,我们的目标是获得具有高测试精度和小尺寸的CNN模型,以解决许多嵌入装置的资源限制。特别是,本文件侧重于在两阶段压缩管道中建议一种通用强化学习模型压缩方法:纯度和量化。我们通过利用深强化学习(DRL)实现压缩的第一阶段,即纯度,即压缩,以共同清除精度,并在分层频道运行后更新FLOPs。我们的目标是获得高测试精度和小尺寸的CNNM模型模型模型,以解决许多嵌入装置的资源限制。第二阶段,即四级压缩和量化,通过类似的DRL方法实现,但侧重于获得各个层次的最佳比值。我们进一步对CFAR-10和图像网络数据集20进行实验结果。对于IMB-10的精确度,拟议的CFLMB-10数据方法可以降低VMBMBT的精确度,而拟议的SAR-10的精确度为SUDMB-10号的精确度,通过SUDMBT-10的精确度,可以将拟议的方法将VGMBS-10的精确度缩小到S-10的精确度方法通过S-10号。