计算内存: DL- 优化 FPGAs 的可调适的计算和存储区块 (Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs) - 专知论文

会员服务 ·

0

块 · 可约的 · Storage · FPGA · 查准率/准确率 ·

2021 年 9 月 30 日

Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs

翻译：计算内存: DL- 优化 FPGAs 的可调适的计算和存储区块

Aman Arora,Bagus Hanindhito,Lizy K. John

from arxiv, 8 pages, IEEE Signal Processing Society's ASILOMAR Conference on Signals, Systems and Computers

The configurable building blocks of current FPGAs -- Logic blocks (LBs), Digital Signal Processing (DSP) slices, and Block RAMs (BRAMs) -- make them efficient hardware accelerators for the rapid-changing world of Deep Learning (DL). Communication between these blocks happens through an interconnect fabric consisting of switching elements spread throughout the FPGA. In this paper, a new block, Compute RAM, is proposed. Compute RAMs provide highly-parallel processing-in-memory (PIM) by combining computation and storage capabilities in one block. Compute RAMs can be integrated in the FPGA fabric just like the existing FPGA blocks and provide two modes of operation (storage or compute) that can be dynamically chosen. They reduce power consumption by reducing data movement, provide adaptable precision support, and increase the effective on-chip memory bandwidth. Compute RAMs also help increase the compute density of FPGAs. In our evaluation of addition, multiplication and dot-product operations across multiple data precisions (int4, int8 and bfloat16), we observe an average savings of 80% in energy consumption, and an improvement in execution time ranging from 20% to 80%. Adding Compute RAMs can benefit non-DL applications as well, and make FPGAs more efficient, flexible, and performant accelerators.

翻译：在当前 FPGA 切片 -- -- 逻辑区块(LBs)、数字信号处理(DSP) 和块内存(BRAM) 的可配置构件块 -- -- 逻辑区块(LBs)、数字信号处理(DSP) 和块内存(BRAMs) -- -- 使这些区块之间的沟通通过由切换元素构成的互连结构而发生。在本文中,提议了一个新的区块,即计算内存(Compete RAs) 。计算内存(PIM), 将计算和存储能力合并到一个区块内存(BRAMs) 。可以像现有的FPGA 区块那样,将内存带(DS) 的硬硬硬件加速器加速器(DRM) 整合到 FPGA 结构中, 在80 平均节能中,在80 节能中,在80 节能中,在80 节能中,在运行,在80 节能中,在80 节能中,在运行,在80次的节能中,在80次内,在运行,在80次的节能中,在80次的节能中,在80次的节能中,在80次的节能中,在80次的节能。

0

相关内容

FPGA加速深度学习综述

FPGA加速深度学习综述

专知会员服务

71+阅读 · 2021年11月13日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【ICLR2020-Facebook 2020】深度学习符号化数学，Deep Learning for Symbolic Mathematics，

【ICLR2020-Facebook 2020】深度学习符号化数学，Deep Learning for Symbolic Mathematics，

专知会员服务

23+阅读 · 2020年4月7日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

将门创投

4+阅读 · 2019年8月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

AuctionWhisk: Using an Auction-Inspired Approach for Function Placement in Serverless Fog Platforms

Arxiv

0+阅读 · 2021年11月23日

NUMAscope: Capturing and Visualizing Hardware Metrics on Large ccNUMA Systems

Arxiv

0+阅读 · 2021年11月23日

A Lightweight Encryption Scheme for IoT Devices in the Fog

Arxiv

0+阅读 · 2021年11月23日

KML: Using Machine Learning to Improve Storage Systems

Arxiv

0+阅读 · 2021年11月22日

4D Segmentation Algorithm with application to 3D+time Image Segmentation

Arxiv

0+阅读 · 2021年11月21日

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

Arxiv

0+阅读 · 2021年11月19日

Efficiently Transforming Tables for Joinability

Arxiv

0+阅读 · 2021年11月18日

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Arxiv

9+阅读 · 2018年9月17日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

FPGA加速深度学习综述

FPGA加速深度学习综述

专知会员服务

71+阅读 · 2021年11月13日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【ICLR2020-Facebook 2020】深度学习符号化数学，Deep Learning for Symbolic Mathematics，

【ICLR2020-Facebook 2020】深度学习符号化数学，Deep Learning for Symbolic Mathematics，

专知会员服务

23+阅读 · 2020年4月7日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

AI CITY发展研究报告：“人工智能+”时代的智慧城市发展范式创新（2025年）

风格迁移：十年综述

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

【HKUST博士论文】迈向可扩展且具泛化能力的时空预测

相关资讯

已删除

将门创投

4+阅读 · 2019年8月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

AuctionWhisk: Using an Auction-Inspired Approach for Function Placement in Serverless Fog Platforms

Arxiv

0+阅读 · 2021年11月23日

NUMAscope: Capturing and Visualizing Hardware Metrics on Large ccNUMA Systems

Arxiv

0+阅读 · 2021年11月23日

A Lightweight Encryption Scheme for IoT Devices in the Fog

Arxiv

0+阅读 · 2021年11月23日

KML: Using Machine Learning to Improve Storage Systems

Arxiv

0+阅读 · 2021年11月22日

4D Segmentation Algorithm with application to 3D+time Image Segmentation

Arxiv

0+阅读 · 2021年11月21日

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

Arxiv

0+阅读 · 2021年11月19日

Efficiently Transforming Tables for Joinability

Arxiv

0+阅读 · 2021年11月18日

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Arxiv

9+阅读 · 2018年9月17日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

微信扫码咨询专知VIP会员