ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs - 专知论文

会员服务 ·

0

Performer · 评论员 · 编译器 · TVM · Tensor ·

2023 年 5 月 6 日

ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs

翻译：暂无翻译

Guyue Huang,Yang Bai,Liu Liu,Yuke Wang,Bei Yu,Yufei Ding,Yuan Xie

Pipelining between data loading and computation is a critical tensor program optimization for GPUs. In order to unleash the high performance of latest GPUs, we must perform a synergetic optimization of multi-stage pipelining across the multi-level buffer hierarchy of GPU. Existing frameworks rely on hand-written libraries such as cuBLAS to perform pipelining optimization, which is inextensible to new operators and un-composable with prior tensor compiler optimizations. This paper presents ALCOP, the first framework that is compiler-native and fully supports multi-stage multi-level pipelining. ALCOP overcomes three critical obstacles in generating code for pipelining: detection of pipelining-applicable buffers, program transformation for multi-level multi-stage pipelining, and efficient schedule parameter search by incorporating static analysis. Experiments show that ALCOP can generate programs with 1.23x speedup on average (up to 1.73x) over vanilla TVM. On end-to-end models, ALCOP can improve upon TVM by up to 1.18x, and XLA by up to 1.64x. Besides, our performance model significantly improves the efficiency of the schedule tuning process and can find schedules with 99% of the performance given by exhaustive search while costing 40x fewer trials.

翻译：暂无翻译

0

相关内容

Performer

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于非易失内存设备的数据读写性能优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向10Tb/in2级磁存储系统的二维LDPC码设计

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

资源虚拟化环境中面向I/O密集型负载的能效优化策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Arxiv

0+阅读 · 2023年6月23日

Addressing Discontinuous Root-Finding for Subsequent Differentiability in Machine Learning, Inverse Problems, and Control

Addressing Discontinuous Root-Finding for Subsequent Differentiability in Machine Learning, Inverse Problems, and Control

Arxiv

0+阅读 · 2023年6月21日

Pushing the Limits of Machine Design: Automated CPU Design with AI

Arxiv

0+阅读 · 2023年6月21日

The Deep Learning Compiler: A Comprehensive Survey

Arxiv

15+阅读 · 2020年2月6日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

12+阅读 · 2019年2月19日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Arxiv

0+阅读 · 2023年6月23日

Addressing Discontinuous Root-Finding for Subsequent Differentiability in Machine Learning, Inverse Problems, and Control

Addressing Discontinuous Root-Finding for Subsequent Differentiability in Machine Learning, Inverse Problems, and Control

Arxiv

0+阅读 · 2023年6月21日

Pushing the Limits of Machine Design: Automated CPU Design with AI

Arxiv

0+阅读 · 2023年6月21日

The Deep Learning Compiler: A Comprehensive Survey

Arxiv

15+阅读 · 2020年2月6日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

12+阅读 · 2019年2月19日

相关基金

基于非易失内存设备的数据读写性能优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向10Tb/in2级磁存储系统的二维LDPC码设计

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

资源虚拟化环境中面向I/O密集型负载的能效优化策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员