端至端 100-TOPS/W 模拟模拟计算机推论:我们到了吗? (End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?) - 专知论文

会员服务 ·

0

层 · Integration · 簇 · 推断 · Performance ·

2021 年 9 月 3 日

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

翻译：端至端 100-TOPS/W 模拟模拟计算机推论:我们到了吗?

Gianmarco Ottavi,Geethan Karunaratne,Francesco Conti,Irem Boybat,Luca Benini,Davide Rossi

from arxiv, 4 pages,6 figures, conference

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.

翻译：在模拟加速(IMA)中,在深神经网络(DNN)的推论中,有望大大提高效率,但在数字系统内整合IMA方面仍然存在挑战。我们建议采用一个混合结构,将8个RISC-V核心与一个IMA核心在共享模组中相连接,分析在移动NetV2瓶头层现实使用情况下的模拟计算的好处和取舍。我们探索了几个IMA集成战略,分析性能、面积和能源效率。我们表明,虽然点性层在软件实施上取得了显著的加速,但在深度层上,无法有效绘制加速器的参数导致吞吐量和面积之间的重大交换。我们提出了一种混合解决办法,即在集心中进行点进化,同时在集心中深度上进行点进,实现比SWF执行快3x的速度,同时在与具有类似性能的IMA全解决方案相比,节省了50%的地区。

0

相关内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NeurIPS 2020接收论文列表发布，1900篇论文都在这了！

专知会员服务

114+阅读 · 2020年10月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

已删除

将门创投

12+阅读 · 2018年6月25日

Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding

Arxiv

0+阅读 · 2021年10月26日

Boost Neural Networks by Checkpoints

Arxiv

0+阅读 · 2021年10月26日

Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications

Arxiv

0+阅读 · 2021年10月25日

Memory visualization tool for training neural network

Arxiv

0+阅读 · 2021年10月25日

Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose

Arxiv

0+阅读 · 2021年10月25日

Network compression and faster inference using spatial basis filters

Arxiv

0+阅读 · 2021年10月25日

Three Practical Workflow Schedulers for Easy Maximum Parallelism

Arxiv

0+阅读 · 2021年10月21日

End to end learning and optimization on graphs

Arxiv

7+阅读 · 2019年5月31日

Quantizing deep convolutional networks for efficient inference: A whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

Arxiv

6+阅读 · 2018年6月21日

Billion-scale Network Embedding with Iterative Random Projection

Arxiv

5+阅读 · 2018年5月7日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NeurIPS 2020接收论文列表发布，1900篇论文都在这了！

专知会员服务

114+阅读 · 2020年10月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

已删除

将门创投

12+阅读 · 2018年6月25日

相关论文

Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding

Arxiv

0+阅读 · 2021年10月26日

Boost Neural Networks by Checkpoints

Arxiv

0+阅读 · 2021年10月26日

Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications

Arxiv

0+阅读 · 2021年10月25日

Memory visualization tool for training neural network

Arxiv

0+阅读 · 2021年10月25日

Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose

Arxiv

0+阅读 · 2021年10月25日

Network compression and faster inference using spatial basis filters

Arxiv

0+阅读 · 2021年10月25日

Three Practical Workflow Schedulers for Easy Maximum Parallelism

Arxiv

0+阅读 · 2021年10月21日

End to end learning and optimization on graphs

Arxiv

7+阅读 · 2019年5月31日

Quantizing deep convolutional networks for efficient inference: A whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

Arxiv

6+阅读 · 2018年6月21日

Billion-scale Network Embedding with Iterative Random Projection

Arxiv

5+阅读 · 2018年5月7日

微信扫码咨询专知VIP会员