RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration - 专知论文

会员服务 ·

0

TinyML · 线性的 · 操作 · 簇 · Engineering ·

2023 年 5 月 6 日

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

翻译：暂无翻译

Yvan Tortorella,Luca Bertaccini,Luca Benini,Davide Rossi,Francesco Conti

The increasing interest in TinyML, i.e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only. Current training algorithms, based on various forms of error and gradient backpropagation, rely on floating-point matrix operations to meet the precision and dynamic range requirements. So far, the energy and power cost of these operations has been considered too high for TinyML scenarios. This paper addresses the open challenge of near-sensor training on a few mW power budget and presents RedMulE - Reduced-Precision Matrix Multiplication Engine, a low-power specialized accelerator conceived for multi-precision floating-point General Matrix-Matrix Operations (GEMM-Ops) acceleration, supporting FP16, as well as hybrid FP8 formats, with {sign, exponent, mantissa}=({1,4,3}, {1,5,2}). We integrate RedMule into a Parallel Ultra-Low-Power (PULP) cluster containing eight energy-efficient RISC-V cores sharing a tightly-coupled data memory and implement the resulting system in a 22 nm technology. At its best efficiency point (@ 470 MHz, 0.65 V), the RedMulE-augmented PULP cluster achieves 755 GFLOPS/W and 920 GFLOPS/W during regular General Matrix-Matrix Multiplication (GEMM), and up to 1.19 TFLOPS/W and 1.67 TFLOPS/W when executing GEMM-Ops, respectively, for FP16 and FP8 input/output tensors. In its best performance point (@ 613 MHz, 0.8 V), RedMulE achieves up to 58.5 GFLOPS and 117 GFLOPS for FP16 and FP8, respectively, with 99.4% utilization of the array of Computing Elements and consuming less than 60 mW on average, thus enabling on-device training of deep learning models in TinyML application scenarios while retaining the flexibility to tackle other classes of common linear algebra problems efficiently.

翻译：暂无翻译

0

相关内容

TinyML

神经网络数学基础，45页ppt

神经网络数学基础，45页ppt

专知会员服务

83+阅读 · 2023年5月7日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

SIGIR2019 接收论文列表

SIGIR2019 接收论文列表

专知

18+阅读 · 2019年4月20日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

HERC2P2调控DNA损伤修复及胶质瘤TMZ化疗增敏：ceRNA作用的新机制

国家自然科学基金

0+阅读 · 2015年12月31日

《数学学报》期刊

国家自然科学基金

5+阅读 · 2015年12月31日

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

高功率脉冲(10 MW)激光光纤偏振走离复合放大技术

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

中国细鳞苔属(Lejeunea)植物的分类修订

国家自然科学基金

0+阅读 · 2013年12月31日

热重生光纤光栅模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于宽带逐时曝辐量反演气溶胶和云光学厚度

国家自然科学基金

0+阅读 · 2013年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

超光学衍射限高空间分辨的双波长激光剥离-激光诱导击穿光谱元素显微分析

国家自然科学基金

0+阅读 · 2012年12月31日

ArctyrEX : Accelerated Encrypted Execution of General-Purpose Applications

Arxiv

0+阅读 · 2023年6月19日

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

Arxiv

0+阅读 · 2023年6月19日

An Architectural Design Decision Model for Resilient IoT Application

Arxiv

0+阅读 · 2023年6月17日

FuzzyLogic.jl: a Flexible Library for Efficient and Productive Fuzzy Inference

Arxiv

0+阅读 · 2023年6月17日

A new approach based on quadratic forms to attack the McEliece cryptosystem

Arxiv

0+阅读 · 2023年6月17日

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Arxiv

0+阅读 · 2023年6月16日

A flexible algorithm to offload DAG applications for edge computing

Arxiv

0+阅读 · 2023年6月15日

Agile, Antifragile, Artificial-Intelligence-Enabled, Command and Control

Arxiv

51+阅读 · 2021年9月14日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

VIP会员

文章信息

相关主题

相关VIP内容

神经网络数学基础，45页ppt

神经网络数学基础，45页ppt

专知会员服务

83+阅读 · 2023年5月7日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

SIGIR2019 接收论文列表

SIGIR2019 接收论文列表

专知

18+阅读 · 2019年4月20日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

ArctyrEX : Accelerated Encrypted Execution of General-Purpose Applications

Arxiv

0+阅读 · 2023年6月19日

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

Arxiv

0+阅读 · 2023年6月19日

An Architectural Design Decision Model for Resilient IoT Application

Arxiv

0+阅读 · 2023年6月17日

FuzzyLogic.jl: a Flexible Library for Efficient and Productive Fuzzy Inference

Arxiv

0+阅读 · 2023年6月17日

A new approach based on quadratic forms to attack the McEliece cryptosystem

Arxiv

0+阅读 · 2023年6月17日

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Arxiv

0+阅读 · 2023年6月16日

A flexible algorithm to offload DAG applications for edge computing

Arxiv

0+阅读 · 2023年6月15日

Agile, Antifragile, Artificial-Intelligence-Enabled, Command and Control

Arxiv

51+阅读 · 2021年9月14日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

相关基金

HERC2P2调控DNA损伤修复及胶质瘤TMZ化疗增敏：ceRNA作用的新机制

国家自然科学基金

0+阅读 · 2015年12月31日

《数学学报》期刊

国家自然科学基金

5+阅读 · 2015年12月31日

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

高功率脉冲(10 MW)激光光纤偏振走离复合放大技术

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

中国细鳞苔属(Lejeunea)植物的分类修订

国家自然科学基金

0+阅读 · 2013年12月31日

热重生光纤光栅模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于宽带逐时曝辐量反演气溶胶和云光学厚度

国家自然科学基金

0+阅读 · 2013年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

超光学衍射限高空间分辨的双波长激光剥离-激光诱导击穿光谱元素显微分析

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员