Machine learning, particularly deep neural network inference, has become a vital workload for many computing systems, from data centers and HPC systems to edge-based computing. As advances in sparsity have helped improve the efficiency of AI acceleration, there is a continued need for improved system efficiency for both high-performance and system-level acceleration. This work takes a unique look at sparsity with an event (or activation-driven) approach to ANN acceleration that aims to minimize useless work, improve utilization, and increase performance and energy efficiency. Our analytical and experimental results show that this event-driven solution presents a new direction to enable highly efficient AI inference for both CNN and MLP workloads. This work demonstrates state-of-the-art energy efficiency and performance centring on activation-based sparsity and a highly-parallel dataflow method that improves the overall functional unit utilization (at 30 fps). This work enhances energy efficiency over a state-of-the-art solution by 1.46$\times$. Taken together, this methodology presents a novel, new direction to achieve high-efficiency, high-performance designs for next-generation AI acceleration platforms.
翻译:机器学习,特别是深神经网络的推断,已成为许多计算机系统,从数据中心和高频枢纽系统到边际计算系统的重要工作量。随着宽度的进步有助于提高AI加速效率,仍然需要提高高性能和系统级加速的系统效率。这项工作对无线加速(或启动驱动)方法的宽度进行了独特的审视,目的是尽量减少无用工作、提高利用率、提高性能和能效。我们的分析和实验结果表明,这种由事件驱动的解决方案为CNN和MLP工作量的高效AI提供了新的方向。这项工作展示了最先进的节能和基于激活的宽度的性能中心,以及一种高平行数据流方法,可以提高功能单位的总体利用率(30英尺/秒),这项工作提高了能源效率,而不是以1.46美元作为最先进的解决方案。加在一起,这一方法为下一代AI加速平台实现高效率、高性能设计提供了新的新方向。