降精度浮点运算在具有倾斜流水线的Systolic阵列中的应用 (Reduced-Precision Floating-Point Arithmetic in Systolic Arrays with Skewed Pipelines) - 专知论文

会员服务 ·

0

阵列 · 精度 · 矩阵乘法 · 操作 · 点乘 ·

2023 年 4 月 4 日

Reduced-Precision Floating-Point Arithmetic in Systolic Arrays with Skewed Pipelines

翻译：降精度浮点运算在具有倾斜流水线的Systolic阵列中的应用

Dionysios Filippas,Christodoulos Peltekis,Giorgos Dimitrakopoulos,Chrysostomos Nicopoulos

from arxiv, Accepted at IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2023

The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are executed efficiently on Systolic Arrays (SA). To effectively trade off deep-learning training/inference quality with hardware cost, SA accelerators employ reduced-precision Floating-Point (FP) arithmetic. In this work, we demonstrate the need for new pipeline organizations to reduce latency and improve energy efficiency of reduced-precision FP operators for the chained multiply-add operation imposed by the structure of the SA. The proposed skewed pipeline design reorganizes the pipelined operation of the FP multiply-add units to enable new forwarding paths for the exponent logic, which allow for parallel execution of the pipeline stages of consecutive PEs. As a result, the latency of the matrix multiplication operation within the SA is significantly reduced with minimal hardware cost, thereby yielding an energy reduction of 8% and 11% for the examined state-of-the-art CNNs.

翻译：在硬件加速深度学习内核中，矩阵乘法可以在Systolic阵列（SA）上执行，从而实现高效率。为了有效地在深度学习训练/推断质量与硬件成本之间进行折衷，SA加速器采用降精度的浮点算术。在本文中，我们展示了需要新的管道组织来减少降精度浮点运算操作的延迟和提高SA运算符的能效。所提出的倾斜流水线设计重新组织了浮点乘加单元的流水线操作，使指数逻辑的新转发路径能够并行执行连续PE的流水线阶段。其结果，SA中的矩阵乘法操作的延迟显著降低，而硬件成本最小，从而为检查的最先进的CNN提供了8％和11％的能源减少。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

专知会员服务

61+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

RF、GBDT、XGBoost面试级整理

RF、GBDT、XGBoost面试级整理

数据挖掘入门与实战

17+阅读 · 2018年3月21日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

吸收倍增分离型波导Si/Ge雪崩探测器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于GNSS的高速列车多源信息融合定位模型及其RAMS评估研究

国家自然科学基金

0+阅读 · 2014年12月31日

GNSS自适应阵列天线引入测量误差的补偿技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于FS的距离单元徙动校正算法对提高FMCW-SAR实时成像质量的研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向无线视频组播的分布式编码与随机混叠调制

国家自然科学基金

0+阅读 · 2012年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

硅基纳米光波导陀螺敏感机制及导波结构表面光滑化关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

柔性η-CuPc纳米柱阵列有机薄膜太阳能电池

国家自然科学基金

0+阅读 · 2011年12月31日

宽带频谱压缩感知与自适应分配算法

国家自然科学基金

0+阅读 · 2011年12月31日

有限域上多项式的降次与P-adic估计、指数和

国家自然科学基金

0+阅读 · 2009年12月31日

In-context Example Selection for Machine Translation Using Multiple Features

Arxiv

0+阅读 · 2023年5月23日

EfficientSpeech: An On-Device Text to Speech Model

Arxiv

0+阅读 · 2023年5月23日

Trend-Based SAC Beam Control Method with Zero-Shot in Superconducting Linear Accelerator

Arxiv

0+阅读 · 2023年5月23日

Federated Variational Inference: Towards Improved Personalization and Generalization

Arxiv

0+阅读 · 2023年5月23日

A first look into the carbon footprint of federated learning

Arxiv

0+阅读 · 2023年5月22日

Reduce: A Framework for Reducing the Overheads of Fault-Aware Retraining

Arxiv

0+阅读 · 2023年5月21日

Compact Lattice Gadget and Its Applications to Hash-and-Sign Signatures

Arxiv

0+阅读 · 2023年5月21日

A Secure and Robust Approach for Distance-Based Mutual Positioning of Unmanned Aerial Vehicles

Arxiv

0+阅读 · 2023年5月19日

Nonconvex Robust High-Order Tensor Completion Using Randomized Low-Rank Approximation

Arxiv

0+阅读 · 2023年5月19日

Epicurus at SemEval-2023 Task 4: Improving Prediction of Human Values behind Arguments by Leveraging Their Definitions

Arxiv

0+阅读 · 2023年5月18日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

专知会员服务

61+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

RF、GBDT、XGBoost面试级整理

RF、GBDT、XGBoost面试级整理

数据挖掘入门与实战

17+阅读 · 2018年3月21日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

In-context Example Selection for Machine Translation Using Multiple Features

Arxiv

0+阅读 · 2023年5月23日

EfficientSpeech: An On-Device Text to Speech Model

Arxiv

0+阅读 · 2023年5月23日

Trend-Based SAC Beam Control Method with Zero-Shot in Superconducting Linear Accelerator

Arxiv

0+阅读 · 2023年5月23日

Federated Variational Inference: Towards Improved Personalization and Generalization

Arxiv

0+阅读 · 2023年5月23日

A first look into the carbon footprint of federated learning

Arxiv

0+阅读 · 2023年5月22日

Reduce: A Framework for Reducing the Overheads of Fault-Aware Retraining

Arxiv

0+阅读 · 2023年5月21日

Compact Lattice Gadget and Its Applications to Hash-and-Sign Signatures

Arxiv

0+阅读 · 2023年5月21日

A Secure and Robust Approach for Distance-Based Mutual Positioning of Unmanned Aerial Vehicles

Arxiv

0+阅读 · 2023年5月19日

Nonconvex Robust High-Order Tensor Completion Using Randomized Low-Rank Approximation

Arxiv

0+阅读 · 2023年5月19日

Epicurus at SemEval-2023 Task 4: Improving Prediction of Human Values behind Arguments by Leveraging Their Definitions

Arxiv

0+阅读 · 2023年5月18日

相关基金

吸收倍增分离型波导Si/Ge雪崩探测器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于GNSS的高速列车多源信息融合定位模型及其RAMS评估研究

国家自然科学基金

0+阅读 · 2014年12月31日

GNSS自适应阵列天线引入测量误差的补偿技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于FS的距离单元徙动校正算法对提高FMCW-SAR实时成像质量的研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向无线视频组播的分布式编码与随机混叠调制

国家自然科学基金

0+阅读 · 2012年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

硅基纳米光波导陀螺敏感机制及导波结构表面光滑化关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

柔性η-CuPc纳米柱阵列有机薄膜太阳能电池

国家自然科学基金

0+阅读 · 2011年12月31日

宽带频谱压缩感知与自适应分配算法

国家自然科学基金

0+阅读 · 2011年12月31日

有限域上多项式的降次与P-adic估计、指数和

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员