Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with several companies (Arm, Intel, IBM) announcing products with specialized matrix engines accessible via GEMM instructions. CPUs are pervasive and need to handle diverse requirements across DL workloads running in edge/HPC/cloud platforms. Therefore, as DL workloads embrace sparsity to reduce the computations and memory size of models, it is also imperative for CPUs to add support for sparsity to avoid under-utilization of the dense matrix engine and inefficient usage of the caches and registers. This work presents VEGETA, a set of ISA and microarchitecture extensions over dense matrix engines to support flexible structured sparsity for CPUs, enabling programmable support for diverse DL models with varying degrees of sparsity. Compared to the state-of-the-art (SOTA) dense matrix engine in CPUs, a VEGETA engine provides 1.09x, 2.20x, 3.74x, and 3.28x speed-ups when running 4:4 (dense), 2:4, 1:4, and unstructured (95%) sparse DNN layers.
翻译:最近,一些公司(Arm、Intel、IBM)宣布了通过 GEMM 指令可以进入的具有专用矩阵引擎的产品。 CPU非常普遍,需要处理在边缘/HPC/cloud平台运行的DL工作量的不同要求。 因此,由于DL工作量包含宽度以减少计算和模型的记忆大小,CPU也必须增加对宽度的支持,以避免密集矩阵引擎的利用不足和缓存器和登记册的低效率使用。 这项工作展示了VEGETA、一套ISA和微结构扩展在密集矩阵引擎之上,以支持CPU的灵活结构宽度,为不同程度宽度的多种DL模型提供可编程支持。 与CPU中最先进的(SOTA)密度矩阵引擎相比, VEGETA 引擎提供1.09x、 2.20x、 3.74x 和 3.28x 速度增速, 运行 4 : 4 (degency) 1: 954, 1: NIND 和 非结构 4 (Sirgyl) 。