Even though fine-grained pruning techniques achieve a high compression ratio, conventional sparsity representations (such as CSR) associated with irregular sparsity degrade parallelism significantly. Practical pruning methods, thus, usually lower pruning rates (by structured pruning) to improve parallelism. In this paper, we study fixed-to-fixed (lossless) encoding architecture/algorithm to support fine-grained pruning methods such that sparse neural networks can be stored in a highly regular structure. We first estimate the maximum compression ratio of encoding-based compression using entropy. Then, as an effort to push the compression ratio to the theoretical maximum (by entropy), we propose a sequential fixed-to-fixed encoding scheme. We demonstrate that our proposed compression scheme achieves almost the maximum compression ratio for the Transformer and ResNet-50 pruned by various fine-grained pruning methods.
翻译:尽管微细细细微的修剪技术达到了高压缩比例,但与非常规聚度相关的传统聚变表(如CSR)会显著地降低平行性。 因此,实用修剪方法通常会降低修剪率(通过结构化的修剪)来改进平行性。 在本文中,我们研究了固定到固定(无损)编码结构/ 理算法,以支持精细裁剪剪裁法,使稀释的神经网络能够存储在高度正常的结构中。 我们首先用英特罗比来估计基于编码的压缩的最大压缩率。 然后,为了努力将压缩率推到理论的最大( 以英特罗比 ), 我们提出一个顺序固定到固定的编码方案。 我们证明我们提议的压缩计划几乎达到了变异器的最大压缩率, ResNet-50 由各种微细细细细的修剪裁法进行。