Serpens: 用于一般用途 Sparse 矩阵- Vexcol 乘法的高带宽内存加速器 (Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication) - 专知论文

会员服务 ·

0

稀疏 · Better · Processing（编程语言） · 讲稿 · 向量化 ·

2022 年 5 月 9 日

Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

翻译：Serpens: 用于一般用途 Sparse 矩阵- Vexcol 乘法的高带宽内存加速器

Linghao Song,Yuze Chi,Licheng Guo,Jason Cong

from arxiv, To appear in DAC'22

Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.

翻译：SpMV在许多应用中发挥着关键作用,从图形分析到深层学习。随机访问稀有矩阵的内存使加速器设计具有挑战性。然而,基于高频内存(HBM)的FPGA(FPGA)对于SpMV的加速器设计来说是一个很好的。在本文中,我们介绍Serpens,一个基于 HB 的SpMV60加速器。Sercomys的特性(1) 一个通用设计、(2) 内存中心处理引擎和(3) 指数煤化,以支持任意的SmmVs的高效处理。从12个大型矩阵的评价来看,Serpens在地理比例方面是1.91x和1.76x,比最新的加速器图LiLy和Sextan分别更适合。我们还对2 519 SuiteSparse small rouples mission 和Serpensionalx 2.10x比K80GPUPO、(2)-road-road-roadLx 和Serreal-69/bx levelyL06和Serreals) 和Syal-69x 和Serplexx 0.69x 和Serplexxxx 。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

54+阅读 · 2020年11月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

Bi/BiVO4@mSiO2三元异质结构光催化降解抗生素废水的性能及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

中国霸王属植物的适应性进化研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-328/SMO/GLI1解析脑胶质瘤中Hedgehog信号通路异常激活的新机制

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥DIF（DRIP1-Interacting Factor）在胁迫信号应答中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

5d电子体系铱氧化物中的自旋-轨道调控

国家自然科学基金

0+阅读 · 2011年12月31日

稀土元素掺杂诱导β-Zn4Sb3态密度畸变及提升热电性能的实验探究

国家自然科学基金

0+阅读 · 2011年12月31日

面向异构多核千万亿次并行机的辐射流体力学并行算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

IER5基因调节宫颈癌放疗敏感性的功能及其作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

Ramsey－CPT原子频标研制

国家自然科学基金

0+阅读 · 2009年12月31日

RAPID: AppRoximAte Pipelined Soft Multipliers and Dividers for High-Throughput and Energy-Efficiency

Arxiv

0+阅读 · 2022年6月28日

LiteCON: An All-Photonic Neuromorphic Accelerator for Energy-efficient Deep Learning (Preprint)

Arxiv

0+阅读 · 2022年6月28日

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

Arxiv

0+阅读 · 2022年6月28日

Universality of Approximate Message Passing algorithms and tensor networks

Arxiv

0+阅读 · 2022年6月27日

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Arxiv

0+阅读 · 2022年6月27日

Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?

Arxiv

0+阅读 · 2022年6月26日

Heterogeneous Multi-core Array-based DNN Accelerator

Arxiv

0+阅读 · 2022年6月25日

Arithmetic Circuits, Structured Matrices and (not so) Deep Learning

Arxiv

0+阅读 · 2022年6月24日

Heterogeneous Sparse Matrix-Vector Multiplication via Compressed Sparse Row Format

Arxiv

0+阅读 · 2022年6月24日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

54+阅读 · 2020年11月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

RAPID: AppRoximAte Pipelined Soft Multipliers and Dividers for High-Throughput and Energy-Efficiency

Arxiv

0+阅读 · 2022年6月28日

LiteCON: An All-Photonic Neuromorphic Accelerator for Energy-efficient Deep Learning (Preprint)

Arxiv

0+阅读 · 2022年6月28日

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

Arxiv

0+阅读 · 2022年6月28日

Universality of Approximate Message Passing algorithms and tensor networks

Arxiv

0+阅读 · 2022年6月27日

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Arxiv

0+阅读 · 2022年6月27日

Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?

Arxiv

0+阅读 · 2022年6月26日

Heterogeneous Multi-core Array-based DNN Accelerator

Arxiv

0+阅读 · 2022年6月25日

Arithmetic Circuits, Structured Matrices and (not so) Deep Learning

Arxiv

0+阅读 · 2022年6月24日

Heterogeneous Sparse Matrix-Vector Multiplication via Compressed Sparse Row Format

Arxiv

0+阅读 · 2022年6月24日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

相关基金

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

Bi/BiVO4@mSiO2三元异质结构光催化降解抗生素废水的性能及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

中国霸王属植物的适应性进化研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-328/SMO/GLI1解析脑胶质瘤中Hedgehog信号通路异常激活的新机制

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥DIF（DRIP1-Interacting Factor）在胁迫信号应答中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

5d电子体系铱氧化物中的自旋-轨道调控

国家自然科学基金

0+阅读 · 2011年12月31日

稀土元素掺杂诱导β-Zn4Sb3态密度畸变及提升热电性能的实验探究

国家自然科学基金

0+阅读 · 2011年12月31日

面向异构多核千万亿次并行机的辐射流体力学并行算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

IER5基因调节宫颈癌放疗敏感性的功能及其作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

Ramsey－CPT原子频标研制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员