稀疏张量加速器基于行乘积的Maple处理单元 (Maple: A Processing Element for Row-Wise Product Based Sparse Tensor Accelerators) - 专知论文

会员服务 ·

0

稀疏 · 单元 · 数据移动 · 多领域 · 基线 ·

2023 年 3 月 27 日

Maple: A Processing Element for Row-Wise Product Based Sparse Tensor Accelerators

翻译：稀疏张量加速器基于行乘积的Maple处理单元

Midia Reshadi,David Gregg

Sparse tensor computing is a core computational part of numerous applications in areas such as data science, graph processing, and scientific computing. Sparse tensors offer the potential of skipping unnecessary computations caused by zero values. In this paper, we propose a new strategy for extending row-wise product sparse tensor accelerators. We propose a new processing element called Maple that uses multiple multiply-accumulate (MAC) units to exploit local clusters of non-zero values to increase parallelism and reduce data movement. Maple works on the compressed sparse row (CSR) format and calculates only non-zero elements of the input matrices based on the sparsity pattern. Furthermore, we may employ Maple as a basic building block in a variety of spatial tensor accelerators that operate based on a row-wise product approach. As a proof of concept, we utilize Maple in two reference accelerators: Extensor and Matraptor. Our experiments show that using Maple in Matraptor and Extensor achieves 50% and 60% energy benefit and 15% and 22% speedup over the baseline designs, respectively. Employing Maple also results in 5.9x and 15.5x smaller area consumption in Matraptor and Extensor compared with the baseline structures, respectively.

翻译：稀疏张量计算是众多领域应用（如数据科学、图形处理和科学计算）中的核心计算部分。稀疏张量具有跳过由零值引起的不必要计算的潜力。在本文中，我们提出了一种扩展行乘积稀疏张量加速器的新策略。我们提出了一种称为Maple的新处理单元，使用多个乘加（MAC）单元利用局部非零值集群，以增加并行性并减少数据移动。Maple基于压缩稀疏行（CSR）格式工作，并根据稀疏模式仅计算输入矩阵的非零元素。此外，我们可以将Maple作为一种基本构建块，用于各种基于行乘积方法操作的空间张量加速器中。作为概念证明，我们在两个参考加速器Extensor和Matraptor中使用Maple。我们的实验结果表明，在Matraptor和Extensor中使用Maple相较于基线设计分别实现了50％和60％的能量收益和15％和22％的加速。使用Maple还使Matraptor和Extensor的面积消耗分别减小了5.9倍和15.5倍。

0

相关内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

专知会员服务

34+阅读 · 2021年3月25日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

斯坦福2020硬课《分布式算法与优化》

斯坦福2020硬课《分布式算法与优化》

专知会员服务

123+阅读 · 2020年5月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

将门创投

11+阅读 · 2019年7月4日

通用矩阵乘（GEMM）优化与卷积计算

通用矩阵乘（GEMM）优化与卷积计算

极市平台

50+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

任意网络中的可分数据处理研究

国家自然科学基金

0+阅读 · 2015年12月31日

高能效FPGA高层次综合研究

国家自然科学基金

2+阅读 · 2013年12月31日

异构众核芯片的可扩展全局功耗管理机制与算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于GPU的离散模拟统一计算软件框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

同步辐射光谱预测方法(Bethe-Salpeter程序)的发展及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

稀疏网格谱方法及其在电子结构薛定谔方程上的应用

国家自然科学基金

0+阅读 · 2012年12月31日

超大规模集成电路布局的ell-1模优化模型及其算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

CPU/GPU协同并行计算在第一性原理电子输运模拟中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

数字图像复原大规模问题的高性能算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

Case Study for Running Memory-Bound Kernels on RISC-V CPUs

Arxiv

0+阅读 · 2023年5月16日

Gradient-enhanced physics-informed neural networks based on transfer learning for inverse problems of the variable coefficient differential equations

Arxiv

0+阅读 · 2023年5月15日

Tighter Abstract Queries in Neural Network Verification

Arxiv

0+阅读 · 2023年5月14日

Spectral computation with third-order tensors using the t-product

Arxiv

0+阅读 · 2023年5月14日

Power Allocation for the Base Matrix of Spatially Coupled Sparse Regression Codes

Arxiv

0+阅读 · 2023年5月13日

The drivers of online polarization: fitting models to data

Arxiv

0+阅读 · 2023年5月12日

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Arxiv

12+阅读 · 2022年2月10日

Graph Neural Networks for Social Recommendation

Arxiv

20+阅读 · 2019年11月23日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

如何加速深度神经网络计算效率？看NVIDIA-ISSCC2021教程，附Slides与视频

专知会员服务

34+阅读 · 2021年3月25日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

斯坦福2020硬课《分布式算法与优化》

斯坦福2020硬课《分布式算法与优化》

专知会员服务

123+阅读 · 2020年5月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《驻地训练手册》美陆军最新72页

《量子隧穿认知神经网络在军民车辆识别与情感分析中的应用》最新论文

俄罗斯对乌克兰无人机作战的战略适应性分析

《美国海岸警卫队2028部队设计执行计划摘要》最新32页

相关资讯

已删除

将门创投

11+阅读 · 2019年7月4日

通用矩阵乘（GEMM）优化与卷积计算

通用矩阵乘（GEMM）优化与卷积计算

极市平台

50+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Case Study for Running Memory-Bound Kernels on RISC-V CPUs

Arxiv

0+阅读 · 2023年5月16日

Gradient-enhanced physics-informed neural networks based on transfer learning for inverse problems of the variable coefficient differential equations

Arxiv

0+阅读 · 2023年5月15日

Tighter Abstract Queries in Neural Network Verification

Arxiv

0+阅读 · 2023年5月14日

Spectral computation with third-order tensors using the t-product

Arxiv

0+阅读 · 2023年5月14日

Power Allocation for the Base Matrix of Spatially Coupled Sparse Regression Codes

Arxiv

0+阅读 · 2023年5月13日

The drivers of online polarization: fitting models to data

Arxiv

0+阅读 · 2023年5月12日

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Arxiv

12+阅读 · 2022年2月10日

Graph Neural Networks for Social Recommendation

Arxiv

20+阅读 · 2019年11月23日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

任意网络中的可分数据处理研究

国家自然科学基金

0+阅读 · 2015年12月31日

高能效FPGA高层次综合研究

国家自然科学基金

2+阅读 · 2013年12月31日

异构众核芯片的可扩展全局功耗管理机制与算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于GPU的离散模拟统一计算软件框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

同步辐射光谱预测方法(Bethe-Salpeter程序)的发展及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

稀疏网格谱方法及其在电子结构薛定谔方程上的应用

国家自然科学基金

0+阅读 · 2012年12月31日

超大规模集成电路布局的ell-1模优化模型及其算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

CPU/GPU协同并行计算在第一性原理电子输运模拟中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

数字图像复原大规模问题的高性能算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员