ScalaBFS: HBM- 增强的 FPGAs 上的可缩放 BFS 加速器 (ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs) - 专知论文

会员服务 ·

0

Performer · Processing（编程语言） · FPGA · 缩放 · prototype ·

2021 年 10 月 12 日

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs

翻译：ScalaBFS: HBM- 增强的 FPGAs 上的可缩放 BFS 加速器

Kexin Li,Chenhao Liu,Zhiyuan Shao,Zeke Wang,Minkang Wu,Jiajie Chen,Xiaofei Liao,Hai Jin

High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth by exposing multiple memory channels to the processing units. To achieve high performance, an accelerator built on top of an FPGA configured with HBM (i.e., FPGA-HBM platform) needs to scale its performance according to the available memory channels. In this paper, we propose an accelerator for BFS (Breadth-First Search) algorithm, named as ScalaBFS, that builds multiple processing elements to sufficiently exploit the high bandwidth of HBM to improve efficiency. We implement the prototype system of ScalaBFS and conduct BFS in both real-world and synthetic scale-free graphs on Xilinx Alveo U280 FPGA card real hardware. The experimental results show that ScalaBFS scales its performance almost linearly according to the available memory pseudo channels (PCs) from the HBM2 subsystem of U280. By fully using the 32 PCs and building 64 processing elements (PEs) on U280, ScalaBFS achieves a performance up to 19.7 GTEPS (Giga Traversed Edges Per Second). When conducting BFS in sparse real-world graphs, ScalaBFS achieves equivalent GTEPS to Gunrock running on the state-of-art Nvidia V100 GPU that features 64-PC HBM2 (twice memory bandwidth than U280).

翻译：高带宽内存( HBM ) 通过向处理单位披露多个存储频道, 提供大型集成记忆带带宽。为了实现高性能, 在以 HBM (即 FPGA- HBM 平台) 配置的 FPGA (即 FPGA- FPGA- HBM 平台) 之上建起的加速器需要根据可用的存储频道来缩放其性能。在本文中, 我们提议了一个名为 ScalaBFS 的 BFS (Breadth- First Search) 算法加速器, 以建立多个处理元素, 以充分利用 HBM的高频带宽来提高效率。为了实现高性能, 我们实施了 ScalaBFS 原型系统, 并在 Xilinx Alveo U280 和合成无规模图形上进行 BFSFS BFS( GG SI Travelople- State) 运行197 GSTBS- Streal State State States, 运行GPS- Streal- Streal- Streal- Streal- Block (GPSBlock) 时, GI- Streal- Streal- Strial- Streal- State State 时, 时, SBFSBSB- Strimal- Stlock 时, 时, 时, SB- Streal- Strial- Stri- Stow 时, 时, 时, 时, SBB- Strimal- Strimal-FS-FS- St 时, 时, 时, 时时, 时运行时, 时, 时正在时时时将时将时将时将时将时时, 时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时时

0

相关内容

Performer

图神经网络及其在电力系统中的应用综述，12页pdf

专知会员服务

67+阅读 · 2021年1月28日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【微众银行】联邦学习白皮书_v2.0，48页pdf，

【微众银行】联邦学习白皮书_v2.0，48页pdf，

专知会员服务

170+阅读 · 2020年4月26日

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

专知会员服务

14+阅读 · 2019年11月17日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

【O'Reilly AI Conference 2019】深度学习的容器化架构（Containerized architectures for deep learning），AWS的 AI和机器学习技术专家Antje Barth

【O'Reilly AI Conference 2019】深度学习的容器化架构（Containerized architectures for deep learning），AWS的 AI和机器学习技术专家Antje Barth

专知会员服务

10+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

40年来首次| 南京大学冯新宇、梁红瑾团队荣获PLDI 2019杰出论文奖

40年来首次| 南京大学冯新宇、梁红瑾团队荣获PLDI 2019杰出论文奖

南大青年

3+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

已删除

将门创投

4+阅读 · 2019年4月1日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

新智元

6+阅读 · 2018年7月17日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

阿里巴巴千万人民币投资魔点科技；永久出行获1亿元天使轮融资；…

阿里巴巴千万人民币投资魔点科技；永久出行获1亿元天使轮融资；…

i黑马

3+阅读 · 2018年1月7日

A Flexible HLS Hoeffding Tree Implementation for Runtime Learning on FPGA

Arxiv

0+阅读 · 2021年12月3日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Dash: Scalable Hashing on Persistent Memory

Arxiv

6+阅读 · 2020年3月16日

Redundancy-Free Computation Graphs for Graph Neural Networks

Arxiv

3+阅读 · 2019年6月9日

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Arxiv

3+阅读 · 2019年5月28日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Meta-Transfer Learning for Few-Shot Learning

Meta-Transfer Learning for Few-Shot Learning

Arxiv

8+阅读 · 2018年12月6日

DARTS: Differentiable Architecture Search

Arxiv

3+阅读 · 2018年6月24日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

图神经网络及其在电力系统中的应用综述，12页pdf

专知会员服务

67+阅读 · 2021年1月28日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【微众银行】联邦学习白皮书_v2.0，48页pdf，

【微众银行】联邦学习白皮书_v2.0，48页pdf，

专知会员服务

170+阅读 · 2020年4月26日

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

专知会员服务

14+阅读 · 2019年11月17日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

【O'Reilly AI Conference 2019】深度学习的容器化架构（Containerized architectures for deep learning），AWS的 AI和机器学习技术专家Antje Barth

【O'Reilly AI Conference 2019】深度学习的容器化架构（Containerized architectures for deep learning），AWS的 AI和机器学习技术专家Antje Barth

专知会员服务

10+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

40年来首次| 南京大学冯新宇、梁红瑾团队荣获PLDI 2019杰出论文奖

40年来首次| 南京大学冯新宇、梁红瑾团队荣获PLDI 2019杰出论文奖

南大青年

3+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

已删除

将门创投

4+阅读 · 2019年4月1日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

新智元

6+阅读 · 2018年7月17日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

阿里巴巴千万人民币投资魔点科技；永久出行获1亿元天使轮融资；…

阿里巴巴千万人民币投资魔点科技；永久出行获1亿元天使轮融资；…

i黑马

3+阅读 · 2018年1月7日

相关论文

A Flexible HLS Hoeffding Tree Implementation for Runtime Learning on FPGA

Arxiv

0+阅读 · 2021年12月3日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Dash: Scalable Hashing on Persistent Memory

Arxiv

6+阅读 · 2020年3月16日

Redundancy-Free Computation Graphs for Graph Neural Networks

Arxiv

3+阅读 · 2019年6月9日

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Arxiv

3+阅读 · 2019年5月28日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Meta-Transfer Learning for Few-Shot Learning

Meta-Transfer Learning for Few-Shot Learning

Arxiv

8+阅读 · 2018年12月6日

DARTS: Differentiable Architecture Search

Arxiv

3+阅读 · 2018年6月24日

微信扫码咨询专知VIP会员