Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they target tensor algebra (typically matrix multiplications); yet dozens of new accelerators are proposed for every new application. The motivation is that the size and sparsity of the workloads heavily influence which architecture is best for memory and computation efficiency. To satisfy the growing demand of efficient computations across a spectrum of workloads on large data centers, we propose deploying a flexible 'heterogeneous' accelerator, which contains many 'sub-accelerators' (smaller specialized accelerators) working together. To this end, we propose: (1) HARD TACO, a quick and productive C++ to RTL design flow to generate many types of sub-accelerators for sparse and dense computations for fair design-space exploration, (2) AESPA, a heterogeneous sparse accelerator design template constructed with the sub-accelerators generated from HARD TACO, and (3) a suite of scheduling strategies to map tensor kernels onto heterogeneous sparse accelerators with high efficiency and utilization. AESPA with optimized scheduling achieves 1.96X higher performance, and 7.9X better energy-delay product (EDP) than a Homogeneous EIE-like accelerator with our diverse workload suite.
翻译:最近,为深神经网络、图形神经网络和科学计算应用程序提出了许多稀少的硬件加速器。所有这些加速器的一个共同特征是,它们针对的是感应代数(典型的矩阵倍增);但为每个新应用程序提出了几十个新的加速器。动机是,工作量的大小和宽度对记忆和计算效率影响最大。为满足大型数据中心一系列工作量中高效计算日益增长的需求,我们提议部署一个灵活的“异质”加速器,其中包括许多“次加速器”(较小型专用加速器)一起工作。为此,我们提议:(1) HARD TACO,一个快速和有生产力的C++到RTL设计流,以产生许多类型的分加速器,用于精密和密集的计算,用于公平的设计-空间探索;(2) AESPA,一个混杂的稀释式加速器设计模板,由高速度器制成,由ASTA-S-C-Slical-Cal-Adrical-Adrial-Adminal-Adminal-Eral-Adal-Ermal-Adal-Axlical-Adal-Adal-Exlical-Adal-Adal-Adal-Axlistr-Adal-Adal-Ax, 和制成一个高的高级和制制成的AA-A-A-A-A-Ax制成的高级和制制制成的A-A-A-AAAAA-A-A-A-A-制式和制式和制制式和制制式和制制制制式和制式和制式和制式和制制式的高级制式和制制制制制制制制制式的高级制制制式和制式制式的制制式和制式和制式和制制式的制式和制式和制制制制制制式的制式和制式制式的制式的制式的制式的制制式和制式和制制制制制制制制制制制制制制制制制式的制制制制制制式和制式和制式和制制制制制制制制制制制制制制制制制制式的