This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs. GPUs are the most commonly used accelerators for deep learning computations. One of the most common techniques for improving the efficiency of CNN models is weight pruning and quantization. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a weight tensors up to 90% or more of sparsity in some deep CNN models. In this article the pruning algorithm is presented which makes it possible to achieve high sparsity levels without accuracy drop. In the next stage the linear and non-linear quantization is adapted for further time and footprint reduction. This paper is an extended of previously published paper concerning effective pruning techniques and present real models pruned with high sparsities and reduced precision which can achieve better performance than the CuDnn library.
翻译:这项工作的重点是利用直接稀释的算法来调整某些卷发神经网络(CNNs)并提高其在图形处理器(GPU)上的效率。 Nvidia 深神经网络(cuDnn) 库是GPU最有效的深层学习算法。 GPU是用于深层学习计算的最常用加速器。 提高CNN模型效率的最常用技术之一是权重调整和量化。 有两种主要的计算法: 结构和非结构。 第一种使许多类型的加速器加快速度容易得多, 但使用这种类型很难达到与第二种类型相比高的宽度和准确度。 非结构性调整,再培训可在某些深层CNN模型中产生高达90%或更多的重力加速度。 文章中显示的调整算法使得有可能在不精确下降的情况下达到高通度水平。 在下一个阶段,线性和非线性硬化的纸质缩缩缩精度将比已出版的纸质缩缩缩的纸质模型更精确。