PIM-DRAM:利用商品记录和记录管理处理加快机器学习工作量 (PIM-DRAM: Accelerating Machine Learning Workloads using Processing in Commodity DRAM) - 专知论文

会员服务 ·

0

Processing（编程语言） · Machine Learning · 冯 · 诺伊曼架构 · 学成 · Networking ·

2021 年 8 月 16 日

PIM-DRAM: Accelerating Machine Learning Workloads using Processing in Commodity DRAM

翻译：PIM-DRAM:利用商品记录和记录管理处理加快机器学习工作量

Sourjya Roy,Mustafa Ali,Anand Raghunathan

Deep Neural Networks (DNNs) have transformed the field of machine learning and are widely deployed in many applications involving image, video, speech and natural language processing. The increasing compute demands of DNNs have been widely addressed through Graphics Processing Units (GPUs) and specialized accelerators. However, as model sizes grow, these von Neumann architectures require very high memory bandwidth to keep the processing elements utilized as a majority of the data resides in the main memory. Processing in memory has been proposed as a promising solution for the memory wall bottleneck for ML workloads. In this work, we propose a new DRAM-based processing-in-memory (PIM) multiplication primitive coupled with intra-bank accumulation to accelerate matrix vector operations in ML workloads. The proposed multiplication primitive adds < 1% area overhead and does not require any change in the DRAM peripherals. Therefore, the proposed multiplication can be easily adopted in commodity DRAM chips. Subsequently, we design a DRAM-based PIM architecture, data mapping scheme and dataflow for executing DNNs within DRAM. System evaluations performed on networks like AlexNet, VGG16 and ResNet18 show that the proposed architecture, mapping, and data flow can provide up to 19.5x speedup over an NVIDIA Titan Xp GPU highlighting the need to overcome the memory bottleneck in future generations of DNN hardware.

翻译：深心神经网络(DNNS)改造了机器学习领域,在涉及图像、视频、语音和自然语言处理的许多应用中广泛运用了机器学习领域。 DNNS不断增长的计算需求通过图形处理股(GPUs)和专门的加速器得到了广泛的解决。然而,随着模型规模的扩大,这些冯纽曼建筑需要非常高的记忆带宽,以保持作为大部分数据在主记忆中使用的处理元素。为ML工作量的内存墙瓶颈,提出了一个很有希望的解决方案。在这项工作中,我们提出了一个新的基于 DRAM 的处理- 模版(PIM) 复制(PIM), 加上银行内部累积, 以加速ML工作量的矩阵矢量操作。拟议的倍增原始结构增加了 < 1% 的区域管理费, 不需要对 DRAM 外围区域作任何改变。因此, 拟议的倍增功能很容易在商品 DRAM 芯片中被采用。随后,我们设计了一个基于 DRAMM 的 PIM 结构、数据映像和数据流化数据在 DNNNP 内执行 DNAMDNP18 的 DISG HIM 的 DIS 快速结构中进行系统评估,,, 系统系统在拟议的网络上可以显示一个超过 CVGGGGGIS AS 的移动的服务器 15 的系统结构, 。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

专知会员服务

219+阅读 · 2019年12月18日

【生物医疗与机器学习的结合】40页ppt，Fusing Biomedicine and Machine Learning

【生物医疗与机器学习的结合】40页ppt，Fusing Biomedicine and Machine Learning

专知会员服务

35+阅读 · 2019年12月13日

【《Python机器学习(第三版)》随书代码】（Python Machine Learning (3nd edition)），威斯康星大学麦迪逊分校助理教授Sebastian Raschka、密歇根州立大学博士Vahid Mirjalili

【《Python机器学习(第三版)》随书代码】（Python Machine Learning (3nd edition)），威斯康星大学麦迪逊分校助理教授Sebastian Raschka、密歇根州立大学博士Vahid Mirjalili

专知会员服务

46+阅读 · 2019年12月3日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

专知会员服务

16+阅读 · 2019年10月3日

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

AINLP

7+阅读 · 2020年2月5日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

已删除

将门创投

6+阅读 · 2019年7月11日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs

Arxiv

0+阅读 · 2021年10月12日

Increasing a microscope's effective field of view via overlapped imaging and machine learning

Arxiv

0+阅读 · 2021年10月10日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

VIP会员

文章信息

相关主题

Processing（编程语言）

Machine Learning

冯 · 诺伊曼架构

相关VIP内容

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

【《Scikit-Learn、Keras与TensorFlow机器学习实用指南(第二版)》电子书与代码(Notebooks)】Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

专知会员服务

219+阅读 · 2019年12月18日

【生物医疗与机器学习的结合】40页ppt，Fusing Biomedicine and Machine Learning

【生物医疗与机器学习的结合】40页ppt，Fusing Biomedicine and Machine Learning

专知会员服务

35+阅读 · 2019年12月13日

【《Python机器学习(第三版)》随书代码】（Python Machine Learning (3nd edition)），威斯康星大学麦迪逊分校助理教授Sebastian Raschka、密歇根州立大学博士Vahid Mirjalili

【《Python机器学习(第三版)》随书代码】（Python Machine Learning (3nd edition)），威斯康星大学麦迪逊分校助理教授Sebastian Raschka、密歇根州立大学博士Vahid Mirjalili

专知会员服务

46+阅读 · 2019年12月3日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

专知会员服务

16+阅读 · 2019年10月3日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】面向时间序列基础模型的合成序列符号数据生成方法

军事通信市场七大趋势概述

【CMU博士论文】深度学习中泛化的量化、理解与改进

面向低光照图像增强的扩散模型

相关资讯

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

新书下载 | 面向机器学习的数学（Mathematics for Machine Learning）

AINLP

7+阅读 · 2020年2月5日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

已删除

将门创投

6+阅读 · 2019年7月11日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

相关论文

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs

Arxiv

0+阅读 · 2021年10月12日

Increasing a microscope's effective field of view via overlapped imaging and machine learning

Arxiv

0+阅读 · 2021年10月10日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

微信扫码咨询专知VIP会员