重新思考混合精密 DNN 加速器的浮点重叠 (Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators) - 专知论文

会员服务 ·

0

查准率/准确率 · 内积 · 可约的 · DNN · CASES ·

2021 年 1 月 27 日

Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators

翻译：重新思考混合精密 DNN 加速器的浮点重叠

Hamzah Abdel-Aziz,Ali Shafiee,Jong Hoon Shin,Ardavan Pedram,Joseph H. Hassoun

from arxiv, Accepted to appear in 4th Conference on Machine Learning and Systems 2021

In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision based on temporal decomposition. We illustrate how to integrate FP computations on integer-based architecture and evaluate overheads incurred by FP arithmetic support. We argue that alignment and addition overhead for FP inner product can be significant since the maximum exponent difference could be up to 58 bits, which results into a large alignment logic. To address this issue, we illustrate empirically that no more than 26-bitproduct bits are required and up to 8-bit of alignment is sufficient in most inference cases. We present novel optimizations based on the above observations to reduce the FP arithmetic hardware overheads. Our empirical results, based on simulation and hardware implementation, show significant reduction in FP16 overhead. Over typical mixed precision implementation, the proposed architecture achieves area improvements of up to 25% in TFLOPS/mm2and up to 46% in TOPS/mm2with power efficiency improvements of up to 40% in TFLOPS/Wand up to 63% in TOPS/W.

翻译：在本文中,我们提出一个混合精密卷动单元结构,支持不同的整数和浮动点(FP)精确度。拟议结构以低位内值产品单位为基础,根据时间分解实现更高的精确度。我们说明如何整合基于整数的建筑结构的FP计算,并评估FP计算支持产生的间接费用。我们争辩说,FP内产物的最大指数差异可能高达58位元,从而形成一个很大的校正逻辑。为了解决这一问题,我们从经验上说明,不需要超过26位比特产品位,在多数推断情况下,最多达8位比特。我们根据上述观察提出了新的优化,以减少FP计算硬件的硬件间接费用。我们根据模拟和硬件实施的经验结果显示,FP16的间接费用显著减少。在典型的混合精确实施中,拟议的结构在TFLLOPS/mm2和TOPS/mm2中实现了高达25%的改进,在TFLOPS/W和TPS/63W中提高了高达40%的功率,在TFLOPS/W中达到了46%。

0

相关内容

查准率/准确率

查准率/准确率

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

人工智能赋能下的银行形态变革，招商银行人工智能实验室负责人李金龙，第八届全国社会媒体处理大会SMP2019

人工智能赋能下的银行形态变革，招商银行人工智能实验室负责人李金龙，第八届全国社会媒体处理大会SMP2019

专知会员服务

40+阅读 · 2019年10月24日

已删除

将门创投

14+阅读 · 2019年5月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1/n)$

Arxiv

0+阅读 · 2021年3月22日

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

Arxiv

0+阅读 · 2021年3月22日

Characterizing the Communication Requirements of GNN Accelerators: A Model-Based Approach

Arxiv

0+阅读 · 2021年3月18日

Stationary underdispersed INAR(1) models based on the backward approach

Arxiv

0+阅读 · 2021年3月18日

Rethinking Attention with Performers

Arxiv

3+阅读 · 2020年9月30日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Arxiv

5+阅读 · 2018年7月19日

Arxiv

8+阅读 · 2018年1月25日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

人工智能赋能下的银行形态变革，招商银行人工智能实验室负责人李金龙，第八届全国社会媒体处理大会SMP2019

人工智能赋能下的银行形态变革，招商银行人工智能实验室负责人李金龙，第八届全国社会媒体处理大会SMP2019

专知会员服务

40+阅读 · 2019年10月24日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

已删除

将门创投

14+阅读 · 2019年5月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1/n)$

Arxiv

0+阅读 · 2021年3月22日

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

Arxiv

0+阅读 · 2021年3月22日

Characterizing the Communication Requirements of GNN Accelerators: A Model-Based Approach

Arxiv

0+阅读 · 2021年3月18日

Stationary underdispersed INAR(1) models based on the backward approach

Arxiv

0+阅读 · 2021年3月18日

Rethinking Attention with Performers

Arxiv

3+阅读 · 2020年9月30日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Arxiv

5+阅读 · 2018年7月19日

Arxiv

8+阅读 · 2018年1月25日

微信扫码咨询专知VIP会员