卷积原语在32位微控制器上嵌入式神经网络中的评估 (Evaluation of Convolution Primitives for Embedded Neural Networks on 32-bit Microcontrollers) - 专知论文

会员服务 ·

0

卷积 · Networking · Neural Networks · state-of-the-art · Performer ·

2023 年 3 月 19 日

Evaluation of Convolution Primitives for Embedded Neural Networks on 32-bit Microcontrollers

翻译：卷积原语在32位微控制器上嵌入式神经网络中的评估

Baptiste Nguyen,Pierre-Alain Moellic,Sylvain Blayac

from arxiv, ISDA 2022

Deploying neural networks on constrained hardware platforms such as 32-bit microcontrollers is a challenging task because of the large memory, computing and energy requirements of their inference process. To tackle these issues, several convolution primitives have been proposed to make the standard convolution more computationally efficient. However, few of these primitives are really implemented for 32-bit microcontrollers. In this work, we collect different state-of-the-art convolutional primitives and propose an implementation for ARM Cortex-M processor family with an open source deployment platform (NNoM). Then, we carry out experimental characterization tests on these implementations. Our benchmark reveals a linear relationship between theoretical MACs and energy consumption. Thus showing the advantages of using computationally efficient primitives like shift convolution. We discuss about the significant reduction in latency and energy consumption due to the use of SIMD instructions and highlight the importance of data reuse in those performance gains. For reproducibility purpose and further experiments, codes and experiments are publicly available.

翻译：在受限硬件平台上部署神经网络，如32位微控制器，是一项具有挑战性的任务，因为它们的推理过程需要大量的内存、计算和能量。为了解决这些问题，提出了几种卷积原语，以使标准卷积更具计算效率。然而，其中很少有原语真正针对32位微控制器进行实现。在这项工作中，我们收集了不同的最新卷积原语，并提出了一种基于ARM Cortex-M处理器系列和NNoM开源部署平台的实现。然后，我们对这些实现进行实验性的表征测试。我们的基准测试揭示了理论MAC和能量消耗之间的线性关系。因此，显示出使用计算效率更高的原语，如移位卷积，在延迟和能量消耗方面的优势。我们讨论了由于SIMD指令的使用引起的延迟和能量消耗的显着降低，并强调了数据重用在这些性能提高中的重要性。为了可重复性目的和进一步的实验，代码和实验是公开可用的。

0

相关内容

在数学（特别是功能分析）中，卷积是对两个函数（f和g）的数学运算，产生三个函数，表示第一个函数的形状如何被另一个函数修改。卷积一词既指结果函数，又指计算结果的过程。它定义为两个函数的乘积在一个函数反转和移位后的积分。并针对所有shift值评估积分，从而生成卷积函数。

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

专知会员服务

40+阅读 · 2020年7月14日

具有组合核的图神经网络，Graph Neural Networks with Composite Kernels

具有组合核的图神经网络，Graph Neural Networks with Composite Kernels

专知会员服务

59+阅读 · 2020年5月20日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

泡泡机器人SLAM

17+阅读 · 2018年10月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

力电脉冲作用下压电陶瓷疲劳裂纹扩展模拟与实验验证

国家自然科学基金

0+阅读 · 2014年12月31日

汞原子光晶格钟的关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

嵌入式多核环境中分区操作系统关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

自膨式人工血管内支架对胸主动脉损伤的生物力学建模与数值仿真

国家自然科学基金

0+阅读 · 2012年12月31日

异构GPU集群混合粒度任务协同调度与动态均衡机制研究

国家自然科学基金

2+阅读 · 2012年12月31日

氧合血体外持续灌注对供肺保护的动物实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于虚拟化技术的嵌入式系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

CPU Cache的功耗驱动设计方法及工具研究

国家自然科学基金

0+阅读 · 2012年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs

Arxiv

0+阅读 · 2023年5月9日

PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds

Arxiv

0+阅读 · 2023年5月8日

ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration

Arxiv

0+阅读 · 2023年5月6日

The Ethics of AI-Generated Maps: A Study of DALLE 2 and Implications for Cartography

Arxiv

0+阅读 · 2023年5月5日

An Adaptive Benchmark for Modeling User Exploration of Large Datasets

Arxiv

0+阅读 · 2023年5月5日

Performance Evaluation of a New Scheduling Model Using Congestion Window Reservation

Arxiv

0+阅读 · 2023年5月5日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

Interpretable Convolutional Neural Networks

Arxiv

22+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

Neural Networks

state-of-the-art

相关VIP内容

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

专知会员服务

40+阅读 · 2020年7月14日

具有组合核的图神经网络，Graph Neural Networks with Composite Kernels

具有组合核的图神经网络，Graph Neural Networks with Composite Kernels

专知会员服务

59+阅读 · 2020年5月20日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

泡泡机器人SLAM

17+阅读 · 2018年10月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs

Arxiv

0+阅读 · 2023年5月9日

PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds

Arxiv

0+阅读 · 2023年5月8日

ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration

Arxiv

0+阅读 · 2023年5月6日

The Ethics of AI-Generated Maps: A Study of DALLE 2 and Implications for Cartography

Arxiv

0+阅读 · 2023年5月5日

An Adaptive Benchmark for Modeling User Exploration of Large Datasets

Arxiv

0+阅读 · 2023年5月5日

Performance Evaluation of a New Scheduling Model Using Congestion Window Reservation

Arxiv

0+阅读 · 2023年5月5日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

Interpretable Convolutional Neural Networks

Arxiv

22+阅读 · 2018年2月14日

相关基金

力电脉冲作用下压电陶瓷疲劳裂纹扩展模拟与实验验证

国家自然科学基金

0+阅读 · 2014年12月31日

汞原子光晶格钟的关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

嵌入式多核环境中分区操作系统关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

自膨式人工血管内支架对胸主动脉损伤的生物力学建模与数值仿真

国家自然科学基金

0+阅读 · 2012年12月31日

异构GPU集群混合粒度任务协同调度与动态均衡机制研究

国家自然科学基金

2+阅读 · 2012年12月31日

氧合血体外持续灌注对供肺保护的动物实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于虚拟化技术的嵌入式系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

CPU Cache的功耗驱动设计方法及工具研究

国家自然科学基金

0+阅读 · 2012年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员