Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support; categorizes different hardware designs and acceleration techniques and analyzes them in terms of hardware and execution costs; analyzes achievable accelerations for recent DNNs; highlights further opportunities in terms of hardware/software/model co-design optimizations (inter/intra-module). The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing trade-offs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the non-zeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; understanding recent design trends for efficient accelerations and further opportunities.
翻译:在许多重要领域广泛使用机器学习(ML)模型。为了高效处理这些计算和记忆密集型应用程序的计算和记忆密集型应用,这些超分度模型的变压器通过调控加速度、缩小体积和分化等手段压缩。不结构的聚变和不同尺寸的变压器产生不规则的计算、通信和内存访问模式;用常规方式处理硬件加速器的这些模型本身不能利用加速速度机会。本文对在硬件加速器方面高效执行微小和不规则的微调计算提供了全面调查。特别是,它讨论了结构设计和软件支持中的强化模块;对不同的硬件设计和加速技术进行了分类,并在硬件和执行成本方面进行了分析;对近期DNNS的可实现的加速模式进行了分析;强调了硬件/软件/模型联合设计优化(间/内部模块)方面的更多机会。本文的取自包括:了解在加速稀释、不正规和不正规的加速度模型方面的主要挑战;理解度;了解最新加速理解系统在支持其高效的储存、升级的计算过程中如何改进成本;分析用于进行具体的计算;进行具体的计算和升级的精度分析。