Dustin:一个由2b-32b至32b的16Cores 平行超低功率分组,完全灵活比分精度和矢量锁定执行模式</s> (Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode) - 专知论文

会员服务 ·

0

簇 · 向量化 · Performer · 峰值 · Integration ·

2023 年 3 月 16 日

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

翻译：Dustin:一个由2b-32b至32b的16Cores 平行超低功率分组,完全灵活比分精度和矢量锁定执行模式

Gianmarco Ottavi,Angelo Garofalo,Giuseppe Tagliavini,Francesco Conti,Alfio Di Mauro,Luca Benini,Davide Rossi

from arxiv, 13 pages, 17 figures, 2 tables, Journal

Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision permutations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38\% power reduction with minimal performance overhead (<3%). The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W.

翻译：深神经网络(DNN)等计算密集型算法正在成为边缘装置的致命应用。在资源限制和电池动力设备上大量输入数据平行算法,这在记忆足迹、计算吞吐量和能源效率方面构成若干挑战。低比维和混合精密算法已被证明是解决这些问题的有效战略。我们提出了Dustin,这是一个完全可编程的计算组,包括16个RISC-V核心,能够2至32位算术和所有可能的混合精度置换。除了传统的多导多导多达塔(MIMD)处理范式外,Dustin还引入了矢量锁定执行模式(VLEM),以最大限度地减少高数据隔热内核的电耗。在VLEM中,一个单一的领导核心抓取指令并将其传送到15个后续核心。ClocktationGRick(IF)级和后续核心的私人缓存导致38<unk> 功率下降,同时实现最低性能顶部( < 3 % ) 和GMAS 的顶峰值。</s>

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

75+阅读 · 2022年3月15日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

22+阅读 · 2020年11月25日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

88+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

159+阅读 · 2020年1月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

239+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

10+阅读 · 2018年12月4日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

双金属纳米颗粒在氢能源领域应用的多尺度设计

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

肝癌活化的星状细胞促进树突状细胞DIgR2表达致肝癌免疫抑制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤源性热休克蛋白70激活肝癌射频消融术后肿瘤免疫反应及其对抗肿瘤作用影响的研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向下一代移动终端的高性能低功耗主存系统结构

国家自然科学基金

0+阅读 · 2012年12月31日

百脉根AP2/ERF转录因子LcSRA1耐盐胁迫应答的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

转录因子Nrf2抑制脂多糖诱导Kupffer细胞NF-κB活化对非酒精性脂肪性肝病的保护作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Clusterin通过线粒体凋亡通路调节肝细胞肝癌化疗耐受机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

家蚕组织蛋白酶D基因表达调控的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

大数据量空间信息实时传输和三维可视化的技术与方法

国家自然科学基金

0+阅读 · 2009年12月31日

Are Code Pre-trained Models Powerful to Learn Code Syntax and Semantics?

Arxiv

0+阅读 · 2023年5月8日

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Arxiv

0+阅读 · 2023年5月7日

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

Arxiv

0+阅读 · 2023年5月6日

Hierarchical Transformer for Scalable Graph Learning

Arxiv

0+阅读 · 2023年5月5日

Can In-context Learners Learn a Reasoning Concept from Demonstrations?

Arxiv

0+阅读 · 2023年5月4日

A Novel Evolutionary Algorithm for Hierarchical Neural Architecture Search

Arxiv

0+阅读 · 2023年5月4日

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Arxiv

0+阅读 · 2023年5月4日

Credibility of high $R^2$ in regression problems: a permutation approach

Arxiv

0+阅读 · 2023年5月4日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

46+阅读 · 2021年1月6日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

75+阅读 · 2022年3月15日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

22+阅读 · 2020年11月25日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

88+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

159+阅读 · 2020年1月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

239+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

10+阅读 · 2018年12月4日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

相关论文

Are Code Pre-trained Models Powerful to Learn Code Syntax and Semantics?

Arxiv

0+阅读 · 2023年5月8日

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Arxiv

0+阅读 · 2023年5月7日

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

Arxiv

0+阅读 · 2023年5月6日

Hierarchical Transformer for Scalable Graph Learning

Arxiv

0+阅读 · 2023年5月5日

Can In-context Learners Learn a Reasoning Concept from Demonstrations?

Arxiv

0+阅读 · 2023年5月4日

A Novel Evolutionary Algorithm for Hierarchical Neural Architecture Search

Arxiv

0+阅读 · 2023年5月4日

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Arxiv

0+阅读 · 2023年5月4日

Credibility of high $R^2$ in regression problems: a permutation approach

Arxiv

0+阅读 · 2023年5月4日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

46+阅读 · 2021年1月6日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

相关基金

双金属纳米颗粒在氢能源领域应用的多尺度设计

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

肝癌活化的星状细胞促进树突状细胞DIgR2表达致肝癌免疫抑制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤源性热休克蛋白70激活肝癌射频消融术后肿瘤免疫反应及其对抗肿瘤作用影响的研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向下一代移动终端的高性能低功耗主存系统结构

国家自然科学基金

0+阅读 · 2012年12月31日

百脉根AP2/ERF转录因子LcSRA1耐盐胁迫应答的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

转录因子Nrf2抑制脂多糖诱导Kupffer细胞NF-κB活化对非酒精性脂肪性肝病的保护作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Clusterin通过线粒体凋亡通路调节肝细胞肝癌化疗耐受机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

家蚕组织蛋白酶D基因表达调控的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

大数据量空间信息实时传输和三维可视化的技术与方法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员