建立深入学习建议培训模式的绩效模型 (Building a Performance Model for Deep Learning Recommendation Model Training on GPUs) - 专知论文

会员服务 ·

0

Performer · MoDELS · 核化 · GPU · 学成 ·

2022 年 1 月 19 日

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

翻译：建立深入学习建议培训模式的绩效模型

Zhongyi Lin,Louis Feng,Ehsan K. Ardestani,Jaewon Lee,John Lundell,Changkyu Kim,Arun Kejariwal,John D. Owens

from arxiv, 11 pages, 10 figures

We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel runtimes) and the device idle time are important components of the overall device time. We therefore tackle them separately by (1) flexibly adopting heuristic-based and ML-based kernel performance models for operators that dominate the device active time, and (2) categorizing operator overheads into five types to determine quantitatively their contribution to the device active time. Combining these two parts, we propose a critical-path-based algorithm to predict the per-batch training time of DLRM by traversing its execution graph. We achieve less than 10% geometric mean average error (GMAE) in all kernel performance modeling, and 5.23% and 7.96% geomean errors for GPU active time and overall end-to-end per-batch training time prediction, respectively. We show that our general performance model not only achieves low prediction error on DLRM, which has highly customized configurations and is dominated by multiple factors, but also yields comparable accuracy on other compute-bound ML models targeted by most previous methods. Using this performance model and graph-level data and task dependency analyses, we show our system can provide more general model-system co-design than previous methods.

翻译：我们为深学习建议模型(DLRM)的GPU培训设计了一个性能模型,其GPU利用率比其他优化的CV和NLP模型低,我们为深学习建议模型(DLRM)的GPU培训设计了一个性能模型。我们显示,设备运行时间(内核运行时间总和)和设备闲置时间都是整个设备运行时间的重要组成部分。因此,我们分别处理这些模型的办法是:(1) 灵活地为控制设备运行时间的操作者采用基于超光速和基于ML的内核性能模型,(2) 将操作者间接费用分为五种类型,以便量化地确定其对设备运行时间的贡献。结合这两部分,我们提出了基于关键路径的算法,以预测DLRM的每批培训时间(内核运行时间总和内核运行时间总和7.96%的地缘差培训时间预测),我们提出的一般性能模型不仅在数量上达到低预测错误,而且通过测试性能模型(MRMM)也采用高定制的多度分析方法。

0

相关内容

Performer

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

众核集群上基于MPI的模型扩展及性能优化研究

国家自然科学基金

1+阅读 · 2015年12月31日

异构云环境下能耗高效调度模型与优化方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

考虑车辆之间博弈行为的交通流动力学建模及复杂特性研究

国家自然科学基金

3+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高性能CPU/GPU协同并行可视化技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

动态复杂生产环境下的大规模多级生产经济批量综合问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多视图能耗模型的能源调配与生产调度协同优化

国家自然科学基金

2+阅读 · 2012年12月31日

物联网资源协同优化与组织管理的理论和方法

国家自然科学基金

0+阅读 · 2011年12月31日

无线传感器网络移动目标的支持向量机建模定位理论

国家自然科学基金

1+阅读 · 2011年12月31日

传输受限的复杂网络控制系统稳定性分析和控制

国家自然科学基金

0+阅读 · 2010年12月31日

Supervised Contrastive Learning for Recommendation

Arxiv

0+阅读 · 2022年4月19日

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

Arxiv

0+阅读 · 2022年4月17日

Prefix-Free Coding for LQG Control

Prefix-Free Coding for LQG Control

Arxiv

0+阅读 · 2022年4月15日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Arxiv

37+阅读 · 2020年10月9日

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommendation

Arxiv

11+阅读 · 2019年6月13日

KGAT: Knowledge Graph Attention Network for Recommendation

Arxiv

40+阅读 · 2019年5月20日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

Learning over Knowledge-Base Embeddings for Recommendation

Arxiv

23+阅读 · 2018年3月22日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Supervised Contrastive Learning for Recommendation

Arxiv

0+阅读 · 2022年4月19日

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

Arxiv

0+阅读 · 2022年4月17日

Prefix-Free Coding for LQG Control

Prefix-Free Coding for LQG Control

Arxiv

0+阅读 · 2022年4月15日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Arxiv

37+阅读 · 2020年10月9日

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommendation

Arxiv

11+阅读 · 2019年6月13日

KGAT: Knowledge Graph Attention Network for Recommendation

Arxiv

40+阅读 · 2019年5月20日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

Learning over Knowledge-Base Embeddings for Recommendation

Arxiv

23+阅读 · 2018年3月22日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

众核集群上基于MPI的模型扩展及性能优化研究

国家自然科学基金

1+阅读 · 2015年12月31日

异构云环境下能耗高效调度模型与优化方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

考虑车辆之间博弈行为的交通流动力学建模及复杂特性研究

国家自然科学基金

3+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高性能CPU/GPU协同并行可视化技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

动态复杂生产环境下的大规模多级生产经济批量综合问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多视图能耗模型的能源调配与生产调度协同优化

国家自然科学基金

2+阅读 · 2012年12月31日

物联网资源协同优化与组织管理的理论和方法

国家自然科学基金

0+阅读 · 2011年12月31日

无线传感器网络移动目标的支持向量机建模定位理论

国家自然科学基金

1+阅读 · 2011年12月31日

传输受限的复杂网络控制系统稳定性分析和控制

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员