Callippla: 加速共振梯度溶剂的流中指令集和混合精度 (Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver) - 专知论文

会员服务 ·

0

查准率/准确率 · 共轭梯度 · 共轭 · Processing（编程语言） · 流 ·

2022 年 9 月 28 日

Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient Solver

翻译：Callippla: 加速共振梯度溶剂的流中指令集和混合精度

Linghao Song,Licheng Guo,Suhail Basalama,Yuze Chi,Robert F. Lucas,Jason Cong

The continued growth in the processing power of FPGAs coupled with high bandwidth memories (HBM), makes systems like the Xilinx U280 credible platforms for the linear solvers which often dominate the run time of scientific and engineering applications. In this paper we present Callipepla, an accelerator for a preconditioned conjugate gradient linear solver (CG). FPGA acceleration of CG faces three challenges: (1) how to support an arbitrary problem and terminate acceleration processing on the fly, (2) how to coordinate long-vector data flow among processing modules, and (3) how to save off-chip memory bandwidth and maintain double (FP64) precision accuracy. To tackle the three challenges, we present (1) a stream-centric instruction set for efficient streaming processing and control, (2) decentralized vector flow scheduling to coordinate vector data flow among modules and further reduce off-chip memory accesses with a double memory channel design, and (3) a mixed precision scheme to save bandwidth yet still achieve effective double precision quality solutions. We prototype the accelerator on a Xilinx U280 HBM FPGA. Our evaluation shows that compared to the Xilinx HPC product, the XcgSolver, Callipepla archives a speedup of 3.94x, 3.36x higher throughput, and 2.94x better energy efficiency. Compared to an NVIDIA A100 GPU which has 4x the memory bandwidth of Callipepla, we still achieve 77% of its throughput with 3.34x higher the energy efficiency.

翻译：FPGA的加速进程面临三个挑战:(1) 如何支持任意问题和终止飞行加速处理,(2) 如何协调处理模块之间的长期矢量数据流,(3) 如何保存离芯存储带宽并保持双倍(FP64精确度)。为了应对这三项挑战,我们提出:(1) 以溪流为中心的高效流处理和控制指南,(2) 分散矢量流时间安排,以协调模块之间的矢量数据流,并进一步减少离流存储器访问,同时使用双存储频道设计,(3) 混合精确计划,以节省带宽,但仍能达到有效的双精度质量解决方案。我们将其加速器原型放在Xilinx U280 HBMFGA上,(3) 如何保存离芯存储带宽带宽并保持双倍(FP64精确度 ) 。为了应对这三项挑战,我们提出:(1) 以溪流为中心的指示,用于高效流流处理和控制,(2) 分散的矢量流流流调度,以协调各模块之间的矢量数据流流流流流,进一步减少离机的内存取用量,(3) 保仍实现有效的双精度质量解决方案。我们将其加速的加速计算其加速计算,3.94 CLAUPLA的加速到3.94x的存储速度,通过X的进度到3.94x的同步,通过X的同步数据到3.94 。

0

相关内容

查准率/准确率

查准率/准确率

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

数据中心以太网拥塞控制

国家自然科学基金

1+阅读 · 2015年12月31日

808nm腔面光栅波长锁定半导体激光器的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向云服务数据中心的OpenScale全光交换网络

国家自然科学基金

3+阅读 · 2013年12月31日

Ramsey－CPT原子频标研制

国家自然科学基金

0+阅读 · 2009年12月31日

Tierkreis: A Dataflow Framework for Hybrid Quantum-Classical Computing

Arxiv

0+阅读 · 2022年11月4日

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

Arxiv

0+阅读 · 2022年11月4日

An Efficient FPGA-based Accelerator for Deep Forest

Arxiv

0+阅读 · 2022年11月4日

Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training

Arxiv

0+阅读 · 2022年11月3日

Multilingual analysis of intelligibility classification using English, Korean, and Tamil dysarthric speech datasets

Arxiv

0+阅读 · 2022年11月3日

Distill and Collect for Semi-Supervised Temporal Action Segmentation

Arxiv

0+阅读 · 2022年11月2日

Design and implementation of a Framework for remote experiments in education

Design and implementation of a Framework for remote experiments in education

Arxiv

0+阅读 · 2022年11月2日

Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment

Arxiv

0+阅读 · 2022年11月1日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

VIP会员

文章信息

相关主题

查准率/准确率

Processing（编程语言）

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Tierkreis: A Dataflow Framework for Hybrid Quantum-Classical Computing

Arxiv

0+阅读 · 2022年11月4日

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

Arxiv

0+阅读 · 2022年11月4日

An Efficient FPGA-based Accelerator for Deep Forest

Arxiv

0+阅读 · 2022年11月4日

Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training

Arxiv

0+阅读 · 2022年11月3日

Multilingual analysis of intelligibility classification using English, Korean, and Tamil dysarthric speech datasets

Arxiv

0+阅读 · 2022年11月3日

Distill and Collect for Semi-Supervised Temporal Action Segmentation

Arxiv

0+阅读 · 2022年11月2日

Design and implementation of a Framework for remote experiments in education

Design and implementation of a Framework for remote experiments in education

Arxiv

0+阅读 · 2022年11月2日

Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment

Arxiv

0+阅读 · 2022年11月1日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

数据中心以太网拥塞控制

国家自然科学基金

1+阅读 · 2015年12月31日

808nm腔面光栅波长锁定半导体激光器的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向云服务数据中心的OpenScale全光交换网络

国家自然科学基金

3+阅读 · 2013年12月31日

Ramsey－CPT原子频标研制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员