计算内存: DL- 优化 FPGAs 的可调适的计算和存储区块 (Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs) - 专知论文

会员服务 ·

0

块 · 可约的 · Storage · FPGA · 查准率/准确率 ·

2021 年 7 月 19 日

Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs

翻译：计算内存: DL- 优化 FPGAs 的可调适的计算和存储区块

Aman Arora,Bagus Hanindhito,Lizy K. John

from arxiv, 8 pages, IEEE Signal Processing Society's ASILOMAR Conference on Signals, Systems and Computers

The configurable building blocks of current FPGAs -- Logic blocks (LBs), Digital Signal Processing (DSP) slices, and Block RAMs (BRAMs) -- make them efficient hardware accelerators for the rapid-changing world of Deep Learning (DL). Communication between these blocks happens through an interconnect fabric consisting of switching elements spread throughout the FPGA. In this paper, a new block, Compute RAM, is proposed. Compute RAMs provide highly-parallel processing-in-memory (PIM) by combining computation and storage capabilities in one block. Compute RAMs can be integrated in the FPGA fabric just like the existing FPGA blocks and provide two modes of operation (storage or compute) that can be dynamically chosen. They reduce power consumption by reducing data movement, provide adaptable precision support, and increase the effective on-chip memory bandwidth. Compute RAMs also help increase the compute density of FPGAs. In our evaluation of addition, multiplication and dot-product operations across multiple data precisions (int4, int8 and bfloat16), we observe an average savings of 80% in energy consumption, and an improvement in execution time ranging from 20% to 80%. Adding Compute RAMs can benefit non-DL applications as well, and make FPGAs more efficient, flexible, and performant accelerators.

翻译：在当前 FPGA 切片 -- -- 逻辑区块(LBs)、数字信号处理(DSP) 和块内存(BRAM) 的可配置构件块 -- -- 逻辑区块(LBs)、数字信号处理(DSP) 和块内存(BRAMs) -- -- 使这些区块之间的沟通通过由切换元素构成的互连结构而发生。在本文中,提议了一个新的区块,即计算内存(Compete RAs) 。计算内存(PIM), 将计算和存储能力合并到一个区块内存(BRAMs) 。可以像现有的FPGA 区块那样,将内存带(DS) 的硬硬硬件加速器加速器(DRM) 整合到 FPGA 结构中, 在80 平均节能中,在80 节能中,在80 节能中,在80 节能中,在运行,在80 节能中,在80 节能中,在运行,在80次的节能中,在80次内,在运行,在80次的节能中,在80次的节能中,在80次的节能中,在80次的节能中,在80次的节能。

0

相关内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

人工智能 | ICAPS 2019等国际会议信息3条

人工智能 | ICAPS 2019等国际会议信息3条

Call4Papers

3+阅读 · 2018年9月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework

Arxiv

0+阅读 · 2021年9月20日

MultPIM: Fast Stateful Multiplication for Processing-in-Memory

Arxiv

0+阅读 · 2021年9月20日

Reconfigurable Low-latency Memory System for Sparse Matricized Tensor Times Khatri-Rao Product on FPGA

Arxiv

0+阅读 · 2021年9月18日

Sparbit: a new logarithmic-cost and data locality-aware MPI Allgather algorithm

Arxiv

0+阅读 · 2021年9月17日

Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback

Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback

Arxiv

0+阅读 · 2021年9月17日

Using hardware performance counters to speed up autotuning convergence on GPUs

Using hardware performance counters to speed up autotuning convergence on GPUs

Arxiv

0+阅读 · 2021年9月17日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

SparseBERT: Rethinking the Importance Analysis in Self-attention

SparseBERT: Rethinking the Importance Analysis in Self-attention

Arxiv

7+阅读 · 2021年2月25日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

人工智能 | ICAPS 2019等国际会议信息3条

人工智能 | ICAPS 2019等国际会议信息3条

Call4Papers

3+阅读 · 2018年9月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework

Arxiv

0+阅读 · 2021年9月20日

MultPIM: Fast Stateful Multiplication for Processing-in-Memory

Arxiv

0+阅读 · 2021年9月20日

Reconfigurable Low-latency Memory System for Sparse Matricized Tensor Times Khatri-Rao Product on FPGA

Arxiv

0+阅读 · 2021年9月18日

Sparbit: a new logarithmic-cost and data locality-aware MPI Allgather algorithm

Arxiv

0+阅读 · 2021年9月17日

Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback

Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback

Arxiv

0+阅读 · 2021年9月17日

Using hardware performance counters to speed up autotuning convergence on GPUs

Using hardware performance counters to speed up autotuning convergence on GPUs

Arxiv

0+阅读 · 2021年9月17日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

SparseBERT: Rethinking the Importance Analysis in Self-attention

SparseBERT: Rethinking the Importance Analysis in Self-attention

Arxiv

7+阅读 · 2021年2月25日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

微信扫码咨询专知VIP会员