FPGA缓存的稀薄矩阵矢量产品(SpMV),用于无结构的计算流体动态模拟 (An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations) - 专知论文

会员服务 ·

0

向量化 · FPGA · 动力学模拟 · 稀疏 · Performance ·

2021 年 7 月 24 日

An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations

翻译：FPGA缓存的稀薄矩阵矢量产品(SpMV),用于无结构的计算流体动态模拟

Guillermo Oyarzun,Daniel Peyrolon,Carlos Alvarez,Xavier Martorell

Field Programmable Gate Arrays generate algorithmic specific architectures that improve the code's FLOP per watt ratio. Such devices are re-gaining interest due to the rise of new tools that facilitate their programming, such as OmpSs. The computational fluid dynamics community is always investigating new architectures that can improve its algorithm's performance. Commonly, those algorithms have a low arithmetic intensity and only reach a small percentage of the peak performance. The sparse matrix-vector multiplication is one of the most time-consuming operations on unstructured simulations. The matrix's sparsity pattern determines the indirect memory accesses of the multiplying vector. This data path is hard to predict, making traditional implementations fail. In this work, we present an FPGA architecture that maximizes the vector's re-usability by introducing a cache-like architecture. The cache is implemented as a circular list that maintains the BRAM vector components while needed. Following this strategy, up to 16 times of acceleration is obtained compared to a naive implementation of the algorithm.

翻译：野外可编程门阵列生成特定算法结构, 改进代码的 FLOP / wat 比率。这些设备正在重新获得兴趣, 原因是新工具的兴起, 方便了它们的编程, 如 OmpS 。计算流体动态社区总是在调查能够改进其算法性能的新结构。通常, 这些算法的算术强度低, 只达到峰值的一小部分。稀疏的矩阵- 矢量乘法是非结构化模拟中最耗时的操作之一。矩阵的宽度模式决定着乘数矢量的间接内存访问。这个数据路径很难预测, 使传统的执行失败。在这项工作中, 我们提出了一个 FPGA 结构, 通过引入一个类似缓存的架构, 使矢量的再可用性最大化。缓存作为循环列表, 以维持所需的 BRAM 矢量组件。在此策略下, 与天真地执行算法相比, 获得最多 16 次的加速度。

0

相关内容

向量化

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【CMU】可扩展人工智能白皮书

专知会员服务

28+阅读 · 2021年7月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

《科学》（20190517出版）一周论文导读

《科学》（20190517出版）一周论文导读

科学网

5+阅读 · 2019年5月19日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

开发者应当了解的18套机器学习平台

开发者应当了解的18套机器学习平台

深度学习世界

5+阅读 · 2018年8月14日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

极市平台

4+阅读 · 2018年4月19日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

HarrisZ$^+$: Harris Corner Selection for Next-Gen Image Matching Pipelines

Arxiv

0+阅读 · 2021年9月29日

Estimating Angle of Arrival (AoA) of multiple Echoes in a Steering Vector Space

Estimating Angle of Arrival (AoA) of multiple Echoes in a Steering Vector Space

Arxiv

0+阅读 · 2021年9月27日

Harrisz+: Harris Corner Selection for Next-Gen Image Matching Pipelines

Arxiv

0+阅读 · 2021年9月27日

Testing and Support Recovery of Correlation Structures for Matrix-Valued Observations with an Application to Stock Market Data

Arxiv

0+阅读 · 2021年9月27日

Efficient Non-linear Calculators

Arxiv

0+阅读 · 2021年9月26日

A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation

Arxiv

0+阅读 · 2021年9月26日

Embedded Hardware Appropriate Fast 3D Trajectory Optimization for Fixed Wing Aerial Vehicles by Leveraging Hidden Convex Structures

Arxiv

0+阅读 · 2021年9月26日

F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption (Extended Version)

Arxiv

0+阅读 · 2021年9月25日

The Max-Line-Formation Problem

Arxiv

0+阅读 · 2021年9月24日

On Mesh Deformation Techniques for Topology Optimization of Fluid-Structure Interaction Problems

Arxiv

0+阅读 · 2021年9月23日

VIP会员

文章信息

相关主题

动力学模拟

相关VIP内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【CMU】可扩展人工智能白皮书

专知会员服务

28+阅读 · 2021年7月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

《科学》（20190517出版）一周论文导读

《科学》（20190517出版）一周论文导读

科学网

5+阅读 · 2019年5月19日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

开发者应当了解的18套机器学习平台

开发者应当了解的18套机器学习平台

深度学习世界

5+阅读 · 2018年8月14日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

极市平台

4+阅读 · 2018年4月19日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

相关论文

HarrisZ$^+$: Harris Corner Selection for Next-Gen Image Matching Pipelines

Arxiv

0+阅读 · 2021年9月29日

Estimating Angle of Arrival (AoA) of multiple Echoes in a Steering Vector Space

Estimating Angle of Arrival (AoA) of multiple Echoes in a Steering Vector Space

Arxiv

0+阅读 · 2021年9月27日

Harrisz+: Harris Corner Selection for Next-Gen Image Matching Pipelines

Arxiv

0+阅读 · 2021年9月27日

Testing and Support Recovery of Correlation Structures for Matrix-Valued Observations with an Application to Stock Market Data

Arxiv

0+阅读 · 2021年9月27日

Efficient Non-linear Calculators

Arxiv

0+阅读 · 2021年9月26日

A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation

Arxiv

0+阅读 · 2021年9月26日

Embedded Hardware Appropriate Fast 3D Trajectory Optimization for Fixed Wing Aerial Vehicles by Leveraging Hidden Convex Structures

Arxiv

0+阅读 · 2021年9月26日

F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption (Extended Version)

Arxiv

0+阅读 · 2021年9月25日

The Max-Line-Formation Problem

Arxiv

0+阅读 · 2021年9月24日

On Mesh Deformation Techniques for Topology Optimization of Fluid-Structure Interaction Problems

Arxiv

0+阅读 · 2021年9月23日

微信扫码咨询专知VIP会员