Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters. To reduce the FLOPs, structure pruning is a popular approach to remove the entire hidden structures via introducing coarse-grained sparsity. Meanwhile, plentiful pruning works leverage fine-grained sparsity instead (sparsity are randomly distributed), whereas their sparse models lack special designed computing library for potential speedup. In this technical report, we study and present an efficient convolution neural network inference system to accelerate its forward pass by utilizing the fine-grained sparsity of compressed CNNs. Our developed FSCNN is established based on a set of specialized designed sparse data structures, operators and associated algorithms. Experimentally, we validate that FSCNN outperforms standard deep learning library PyTorch on popular CNN architectures such as VGG16 if sufficiently high sparsity exhibits. However, due to the contiguity issue of sparse operators, FSCNN is typically not comparable with highly optimized dense operator. Therefore, coarse-grained (structured) sparsity is our recommendation for generic model compression.
翻译:突变神经网络(CNNs)取得了显著的成功,但通常伴随着高计算成本和大量冗余重量参数。 为了减少FLOPs, 结构修剪是一种通俗的方法,通过引入粗微的微粒放大器来清除整个隐藏结构。 与此同时, 丰硕的剪裁工作将微粒的辐射作用推向微小( 差异是随机分布的), 而其稀有的模型缺乏为潜在加速而专门设计的计算机图书馆。 在本技术报告中,我们研究和展示一个高效的共振神经网络推论系统,通过利用压缩的CNN的精密过滤器加速其前行。 我们开发的FSCNN是根据一套专门设计的稀有数据结构、操作员和相关算法建立的。 实验性地,我们证实FSCNN在VGG16等流行的广度显示器上超越了标准的深层学习图书馆PyTorrch。 然而,由于稀有的操作员的连续性问题,FSCNN通常与高度优化的密度操作员不相匹配。 因此, 粗缩( 结构) 压缩(结构) 质压缩的容器是我们的一般建议。