Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not speed up DNN training and can even require more iterations to reach model convergence. In this work, we propose a novel Structured Data Gradient Pruning (SDGP) method that can speed up training without impacting model convergence. This approach enforces a specific sparsity structure, where only N out of every M elements in a matrix can be nonzero, making it amenable to hardware acceleration. Modern accelerators such as the Nvidia A100 GPU support this type of structured sparsity for 2 nonzeros per 4 elements in a reduction. Assuming hardware support for 2:4 sparsity, our approach can achieve a 15-25\% reduction in total training time without significant impact to performance. Source code and pre-trained models are available at \url{https://github.com/BradMcDanel/sdgp}.
翻译:重力调整是一种技术,通过减少培训过程中的模型参数数量,使深神经网络(DNN)推论更具有计算效率。然而,大多数重力调整技术通常不会加速DNN培训,甚至需要更多迭代才能达到模型趋同。在这项工作中,我们建议采用新型结构化数据加速节奏(SDGP)方法,可以在不影响模型趋同的情况下加快培训速度。这一方法执行一种特定的宽度结构,即矩阵中每个M元素中只有N可以不为零,从而可以加速硬件加速。 Nvidia A100 GPU等现代加速器支持这种结构式加速器,每4个元素中为2个非零支持这种结构加速器。假定对2 4个加速器的硬件支持,我们的方法可以在不产生显著性能影响的情况下在全部培训时间减少15-25 ⁇ 。源码和预培训模型可在以下https://github.com/BradMDnel/sdgp}获得。