G-GPU:类似 GPU ASIC 加速器的全自动发电机 (G-GPU: A Fully-Automated Generator of GPU-like ASIC Accelerators) - 专知论文

会员服务 ·

0

Performer · Automator · 可辨认的 · Processing（编程语言） · 前向 ·

2021 年 11 月 11 日

G-GPU: A Fully-Automated Generator of GPU-like ASIC Accelerators

翻译：G-GPU:类似 GPU ASIC 加速器的全自动发电机

Tiago Diadami Perez,Márcio M. Gonçalves,José Rodrigo Azambuja,Leonardo Gobatto,Marcelo Brandalero,Samuel Pagliarini

Modern Systems on Chip (SoC), almost as a rule, require accelerators for achieving energy efficiency and high performance for specific tasks that are not necessarily well suited for execution in standard processing units. Considering the broad range of applications and necessity for specialization, the design of SoCs has thus become expressively more challenging. In this paper, we put forward the concept of G-GPU, a general-purpose GPU-like accelerator that is not application-specific but still gives benefits in energy efficiency and throughput. Furthermore, we have identified an existing gap for these accelerators in ASIC, for which no known automated generation platform/tool exists. Our solution, called GPUPlanner, is an open-source generator of accelerators, from RTL to GDSII, that addresses this gap. Our analysis results show that our automatically generated G-GPU designs are remarkably efficient when compared against the popular CPU architecture RISC-V, presenting speed-ups of up to 223 times in raw performance and up to 11 times when the metric is performance derated by area. These results are achieved by executing a design space exploration of the GPU-like accelerators, where the memory hierarchy is broken in a smart fashion and the logic is pipelined on demand. Finally, tapeout-ready layouts of the G-GPU in 65nm CMOS are presented.

翻译：近似于常规的芯片(SOC)现代系统需要加速器来实现能源效率和高性能,而具体任务不一定适合标准处理单位执行。考虑到应用和专业化需要的广泛范围,SoC的设计因此变得格外具有挑战性。在本文件中,我们提出了G-GPU的概念,G-GPU是一个通用的GPU式加速器,它不是具体应用的通用GPU式加速器,但在能源效率和吞吐量方面仍然带来效益。此外,我们已经为ASIC的这些加速器找出了现有差距,因为没有已知的自动生成平台/工具。我们称为GPUPUPlanner的解决方案是从RTL到GDSSII的加速器的开源生成器,从而解决了这一差距。我们的分析结果表明,我们自动生成的G-GPU的G加速器设计与广受欢迎的CPU结构(RISC-V)相比,效率非常高,在原始性能表现方面速度高达223倍,在指标被区域贬低时达到11倍。我们称之为GPUPRER的解决方案,这些结果通过执行智能的G-rodemod Stimstal develop drutlock-hal lades the the lades the the lades the lades lades lades lades the lades lappral-s

0

相关内容

Performer

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

五个精彩实用的自然语言处理资源

五个精彩实用的自然语言处理资源

机器学习研究会

6+阅读 · 2018年2月23日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Negative curvature obstructs acceleration for geodesically convex optimization, even with exact first-order oracles

Arxiv

0+阅读 · 2022年1月14日

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Arxiv

0+阅读 · 2022年1月13日

An Accelerator for Rule Induction in Fuzzy Rough Theory

Arxiv

0+阅读 · 2022年1月7日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

Generative Video Transformer: Can Objects be the Words?

Arxiv

6+阅读 · 2021年7月20日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Beyond Trade-off: Accelerate FCN-based Face Detector with Higher Accuracy

Arxiv

4+阅读 · 2018年4月14日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

High-Resolution Deep Convolutional Generative Adversarial Networks

Arxiv

8+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

五个精彩实用的自然语言处理资源

五个精彩实用的自然语言处理资源

机器学习研究会

6+阅读 · 2018年2月23日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

Negative curvature obstructs acceleration for geodesically convex optimization, even with exact first-order oracles

Arxiv

0+阅读 · 2022年1月14日

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Arxiv

0+阅读 · 2022年1月13日

An Accelerator for Rule Induction in Fuzzy Rough Theory

Arxiv

0+阅读 · 2022年1月7日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

Generative Video Transformer: Can Objects be the Words?

Arxiv

6+阅读 · 2021年7月20日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Beyond Trade-off: Accelerate FCN-based Face Detector with Higher Accuracy

Arxiv

4+阅读 · 2018年4月14日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

High-Resolution Deep Convolutional Generative Adversarial Networks

Arxiv

8+阅读 · 2018年1月27日

微信扫码咨询专知VIP会员