Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such as image classification and object detection. However, training these models involving large parameters is both time-consuming and energy-hogging. In this regard, several prior works have advocated for sparsity to speed up the of DL training and more so, the inference phase. This work begins with the observation that during training, sparsity in the forward and backward passes are correlated. In that context, we investigate two types of sparsity (input and output type) inherent in gradient descent-based optimization algorithms and propose a hardware micro-architecture to leverage the same. Our experimental results use five state-of-the-art CNN models on the Imagenet dataset, and show back propagation speedups in the range of 1.69$\times$ to 5.43$\times$, compared to the dense baseline execution. By exploiting sparsity in both the forward and backward passes, speedup improvements range from 1.68$\times$ to 3.30$\times$ over the sparsity-agnostic baseline execution. Our work also achieves significant reduction in training iteration time over several previously proposed dense as well as sparse accelerator based platforms, in addition to achieving order of magnitude energy efficiency improvements over GPU based execution.
翻译:以机器/深造(ML/DL)为基础的技术正在成为许多尖端技术的驱动力,在图像分类和物体探测等计算机视觉工作量方面实现了高度精准性,但是,培训这些涉及大量参数的模型既耗时,又耗能。在这方面,以前的一些工作主张使宽度加快DL培训的速度,从而加快DL培训的速度,因此,推论阶段更是这样。这项工作首先发现,在培训期间,前向和后向通道的宽度是相互关联的。在这方面,我们调查了梯度基底优化算法所固有的两种偏差(投入和产出类型)类型(投入和产出类型),并提出了一种硬件微型结构来利用这些模型。我们的实验结果在图像网数据集上使用了五种最先进的CNN模型,在1.69美元到5.43美元之间显示传播速度的回移速度,与密集的基线执行时间相比。我们利用前向前和后向通道的紧张性改进速度,在1.68美元到3.30美元之间加速改进幅度的改进幅度,并提议利用硬微缩缩时间,在前基底执行过程中实现大量减少速度。