混合集成内存模拟计算架构与张量处理单元的异构系统 (Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units) - 专知论文

会员服务 ·

0

TPU · iMac · 张量处理 · 张量处理单元 · 模拟计算 ·

2023 年 4 月 18 日

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

翻译：混合集成内存模拟计算架构与张量处理单元的异构系统

Mohammed E. Elbtity,Brendan Reidy,Md Hasibul Amin,Ramtin Zand

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in convolutional neural networks (CNNs). However, they struggle to maintain the same efficiency in fully connected (FC) layers, leading to suboptimal hardware utilization. In-memory analog computing (IMAC) architectures, on the other hand, have demonstrated notable speedup in executing FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. To leverage the strengths of TPUs for convolutional layers and IMAC circuits for dense layers, we propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC configuration achieves up to $2.59\times$ performance improvements, and $88\%$ memory reductions compared to conventional TPU architectures for various CNN models while maintaining comparable accuracy. The TPU-IMAC architecture shows potential for various applications where energy efficiency and high performance are essential, such as edge computing and real-time processing in mobile devices. The unified training algorithm and the integration of IMAC and TPU architectures contribute to the potential impact of this research on the broader machine learning landscape.

翻译：张量处理单元（TPU）是机器学习任务的专用硬件加速器，在卷积神经网络（CNN）中执行卷积层时显示了显着的性能提升。但是，在全连接（FC）层中，它们难以保持相同的效率，导致子优化的硬件利用率。另一方面，内存模拟计算（IMAC）架构在执行密集层时显示了显着的加速。本文介绍了一种新型的、异构的、混合信号和混合精度架构，将IMAC单元与边缘TPU集成在一起，以增强移动CNN性能。为了利用TPU的卷积层优势和IMAC电路的密集层优势，我们提出了一种统一的学习算法，结合混合精度训练技术，以减轻在TPU-IMAC架构上部署模型时可能出现的精度降低问题。模拟结果表明，与传统TPU架构相比，TPU-IMAC配置可以实现多达2.59倍的性能提升和88%的内存减少，适用于各种CNN模型，同时保持可比的精度。TPU-IMAC架构在能量效率和高性能等方面的潜在应用，如边缘计算和移动设备实时处理。统一的训练算法和IMAC与TPU架构的集成有助于本研究对更广泛的机器学习领域产生影响。

0

相关内容

TPU

Transformer推理的全栈优化综述

Transformer推理的全栈优化综述

专知会员服务

83+阅读 · 2023年3月4日

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

专知会员服务

43+阅读 · 2022年10月18日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

专知会员服务

22+阅读 · 2020年4月11日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

专知会员服务

48+阅读 · 2020年2月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

面向癌症精准诊疗的高密度微腔阵列式多重数字PCR芯片

国家自然科学基金

2+阅读 · 2017年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

肝癌细胞靶向性药物与miR-122共传输体系及其协同抗肿瘤作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于偶极矩调控和侧链修饰的新型聚合物材料的设计、制备与光电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

地球物理反演中的混合并行计算方法研究- - 以MT Occam并行反演为例

国家自然科学基金

0+阅读 · 2012年12月31日

100Gb/s高速光逻辑门及可重构光逻辑门芯片研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速Flash ADC量化模型设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU性能模型的异构系统优化技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Arxiv

0+阅读 · 2023年6月5日

Streaming Task Graph Scheduling for Dataflow Architectures

Arxiv

0+阅读 · 2023年6月5日

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

Arxiv

0+阅读 · 2023年6月4日

Auto-Spikformer: Spikformer Architecture Search

Arxiv

0+阅读 · 2023年6月1日

Graph Clustering with Graph Neural Networks

Arxiv

0+阅读 · 2023年6月1日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

VIP会员

文章信息

相关主题

张量处理单元

相关VIP内容

Transformer推理的全栈优化综述

Transformer推理的全栈优化综述

专知会员服务

83+阅读 · 2023年3月4日

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

【牛津大学博士论文】基于资源约束平台的高性能深度学习，173页pdf

专知会员服务

43+阅读 · 2022年10月18日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

专知会员服务

22+阅读 · 2020年4月11日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

专知会员服务

48+阅读 · 2020年2月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Learning Similarity among Users for Personalized Session-Based Recommendation from hierarchical structure of User-Session-Item

Arxiv

0+阅读 · 2023年6月5日

Streaming Task Graph Scheduling for Dataflow Architectures

Arxiv

0+阅读 · 2023年6月5日

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

Arxiv

0+阅读 · 2023年6月4日

Auto-Spikformer: Spikformer Architecture Search

Arxiv

0+阅读 · 2023年6月1日

Graph Clustering with Graph Neural Networks

Arxiv

0+阅读 · 2023年6月1日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

相关基金

面向癌症精准诊疗的高密度微腔阵列式多重数字PCR芯片

国家自然科学基金

2+阅读 · 2017年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

肝癌细胞靶向性药物与miR-122共传输体系及其协同抗肿瘤作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于偶极矩调控和侧链修饰的新型聚合物材料的设计、制备与光电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

地球物理反演中的混合并行计算方法研究- - 以MT Occam并行反演为例

国家自然科学基金

0+阅读 · 2012年12月31日

100Gb/s高速光逻辑门及可重构光逻辑门芯片研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速Flash ADC量化模型设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU性能模型的异构系统优化技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员