RAELLA: 改革算术以实现高效、低分辨率和低损失的模拟计算机芯片中的数据处理（PIM）：无需重新训练！ (RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!) - 专知论文

会员服务 ·

0

DNN · 低分辨率 · 模拟计算 · 损失 · 分割 ·

2023 年 4 月 17 日

RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!

翻译：RAELLA: 改革算术以实现高效、低分辨率和低损失的模拟计算机芯片中的数据处理（PIM）：无需重新训练！

Tanner Andrulis,Joel S. Emer,Vivienne Sze

from arxiv, 16 pages; 15 figures; Accepted at ISCA 2023 (the International Symposium on Computer Architecture)

Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency is limited by energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC cost do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These strategies harm DNN accuracy and/or require costly DNN retraining to compensate. To address these issues, we propose the RAELLA architecture. RAELLA adapts the architecture to each DNN; it lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts. Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9$\times$ and throughput by up to 3.3$\times$. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining.

翻译：处理器内存（PIM）加速器通过减少昂贵的数据移动和使用电阻性内存（ReRAM）进行高效的模拟计算，有潜力高效运行深度神经网络（DNN）推断。不幸的是，整体PIM加速器效率受到能耗高的模拟至数字转换器（ADC）的限制。此外，现有的降低ADC成本的加速器通过更改DNN权重或使用低分辨率ADC来降低输出保真度。这些策略会损害DNN的准确性和/或需要昂贵的DNN重新训练来进行补偿。为解决这些问题，我们提出了RAELLA架构。RAELLA使体系结构适应每个DNN；通过对计算的模拟值进行编码以生成接近于零的模拟值，为每个DNN层适应性地分割权重和通过猜测和恢复动态分割输入，降低计算的分辨率。低分辨率的模拟值使RAELLA能够使用低分辨率ADC并保持准确性而无需重新训练，同时计算时使用更少的ADC转换器。与其他低准确度损失的PIM加速器相比，RAELLA提高了能源效率高达4.9倍，吞吐量高达3.3倍。与导致准确度损失并重新训练DNN进行恢复的PIM加速器相比，RAELLA在无需昂贵的DNN重新训练的情况下实现了类似的效率和吞吐量。

0

相关内容

DNN

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

专知会员服务

129+阅读 · 2020年6月22日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

专知会员服务

61+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

大脑带来的启发：深度神经网络优化中突触整合原理介绍

大脑带来的启发：深度神经网络优化中突触整合原理介绍

机器之心

0+阅读 · 2022年7月30日

涨点神器！SIoU：目标检测的新损失函数，提高准确性和训练速度！

涨点神器！SIoU：目标检测的新损失函数，提高准确性和训练速度！

CVer

0+阅读 · 2022年6月7日

一文搞懂反向传播

一文搞懂反向传播

机器学习与推荐算法

18+阅读 · 2020年3月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

【泡泡一分钟】在CPU上进行实时无监督单目深度估计

【泡泡一分钟】在CPU上进行实时无监督单目深度估计

泡泡机器人SLAM

17+阅读 · 2019年5月10日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

高性能低比特视觉搜索及芯片结构研究

国家自然科学基金

1+阅读 · 2016年12月31日

稀疏信号驱动的时间序列信号盲分离优化模型及算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

陷波频率精确可调的FIR稀疏多频陷波器设计算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

人胚胎干细胞中non-CG甲基化建立的表观遗传学调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

体系结构/算法级嵌入式GPU低功耗策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

TCTP在乳腺癌干细胞辐射抵抗中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PKR调节脂毒性胰岛β细胞整体功能障碍机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

视频编码实时处理算法研究与VLSI实现

国家自然科学基金

0+阅读 · 2009年12月31日

编队卫星SAR空时信号处理研究

国家自然科学基金

1+阅读 · 2009年12月31日

Wuerstchen: Efficient Pretraining of Text-to-Image Models

Arxiv

0+阅读 · 2023年6月1日

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年6月1日

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Arxiv

0+阅读 · 2023年5月31日

DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

Arxiv

0+阅读 · 2023年5月31日

NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators with 3D-Stacked-DRAM

Arxiv

0+阅读 · 2023年5月30日

Sharing Leaky-Integrate-and-Fire Neurons for Memory-Efficient Spiking Neural Networks

Arxiv

0+阅读 · 2023年5月26日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

专知会员服务

129+阅读 · 2020年6月22日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

专知会员服务

61+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

大脑带来的启发：深度神经网络优化中突触整合原理介绍

大脑带来的启发：深度神经网络优化中突触整合原理介绍

机器之心

0+阅读 · 2022年7月30日

涨点神器！SIoU：目标检测的新损失函数，提高准确性和训练速度！

涨点神器！SIoU：目标检测的新损失函数，提高准确性和训练速度！

CVer

0+阅读 · 2022年6月7日

一文搞懂反向传播

一文搞懂反向传播

机器学习与推荐算法

18+阅读 · 2020年3月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

【泡泡一分钟】在CPU上进行实时无监督单目深度估计

【泡泡一分钟】在CPU上进行实时无监督单目深度估计

泡泡机器人SLAM

17+阅读 · 2019年5月10日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

相关论文

Wuerstchen: Efficient Pretraining of Text-to-Image Models

Arxiv

0+阅读 · 2023年6月1日

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年6月1日

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Arxiv

0+阅读 · 2023年5月31日

DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

Arxiv

0+阅读 · 2023年5月31日

NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators with 3D-Stacked-DRAM

Arxiv

0+阅读 · 2023年5月30日

Sharing Leaky-Integrate-and-Fire Neurons for Memory-Efficient Spiking Neural Networks

Arxiv

0+阅读 · 2023年5月26日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

相关基金

高性能低比特视觉搜索及芯片结构研究

国家自然科学基金

1+阅读 · 2016年12月31日

稀疏信号驱动的时间序列信号盲分离优化模型及算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

陷波频率精确可调的FIR稀疏多频陷波器设计算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

人胚胎干细胞中non-CG甲基化建立的表观遗传学调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

体系结构/算法级嵌入式GPU低功耗策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

TCTP在乳腺癌干细胞辐射抵抗中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PKR调节脂毒性胰岛β细胞整体功能障碍机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

视频编码实时处理算法研究与VLSI实现

国家自然科学基金

0+阅读 · 2009年12月31日

编队卫星SAR空时信号处理研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员