超标:通过统一抽象支持灵活DNN平行化 (SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction) - 专知论文

会员服务 ·

0

DNN · MoDELS · 变换 · state-of-the-art · Principle ·

2023 年 1 月 21 日

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

翻译：超标:通过统一抽象支持灵活DNN平行化

Zhiqi Lin,Youshan Miao,Guodong Liu,Xiaoxiang Shi,Quanlu Zhang,Fan Yang,Saeed Maleki,Yi Zhu,Xu Cao,Cheng Li,Mao Yang,Lintao Zhang,Lidong Zhou

With the growing model size, deep neural networks (DNN) are increasingly trained over massive GPU accelerators, which demands a proper parallelization plan that transforms a DNN model into fine-grained tasks and then schedules them to GPUs for execution. Due to the large search space, the contemporary parallelization plan generators often rely on empirical rules that couple transformation and scheduling, and fall short in exploring more flexible schedules that yield better memory usage and compute efficiency. This tension can be exacerbated by the emerging models with increasing complexity in their structure and model size. SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans. It formulates the plan design and generation into three sequential phases explicitly: model transformation, space-time scheduling, and data dependency preserving. Such a principled approach decouples multiple seemingly intertwined factors and enables the composition of highly flexible parallelization plans. As a result, SuperScaler can not only generate empirical parallelization plans, but also construct new plans that achieve up to 3.5X speedup compared to state-of-the-art solutions like DeepSpeed, Megatron and Alpa, for emerging DNN models like Swin-Transformer and AlphaFold2, as well as well-optimized models like GPT-3.

翻译：随着模型规模的扩大,深神经网络(DNN)越来越多地被培训成大型GPU加速器,这要求有一个适当的平行化计划,将DNN模型转化为细微的细细任务,然后将其排入到GPU执行。由于搜索空间的扩大,当代平行计划生成者往往依赖将转换和时间安排结合起来的经验规则,在探索更灵活的时间表以产生更好的记忆使用和计算效率方面做得不够。这种紧张状态可能由于正在形成的模型的结构和模型的复杂程度越来越高而加剧。超级软件是一个便利设计和生成高度灵活平行计划的系统。它将计划设计和生成明确分为三个相继阶段:模型转换、空间时间安排和数据依赖性保存。这种有原则的方法拆分多种看起来相互交织的因素,使得高度灵活的平行化计划得以组成。因此,超级仪不仅能够产生经验化的平行化计划,而且还可以建立新计划,达到3.5X的速度,而与诸如DeepSpeed、Megatron和Aloppa等最新技术解决方案相比,成为新兴的GNNFTRA型和类似的G-FROTRA模型。

0

相关内容

DNN

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

高速铁路新型“独立”供电系统建模、仿真与优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

众核处理器片内资源利用率和能效优化研究

国家自然科学基金

1+阅读 · 2012年12月31日

可压缩Navier-Stokes方程的一些数学问题

国家自然科学基金

0+阅读 · 2012年12月31日

横向约束钢管混凝土柱低周疲劳性能的研究

国家自然科学基金

0+阅读 · 2011年12月31日

有序介孔炭/聚苯胺高性能超级电容器电极材料的优化设计

国家自然科学基金

0+阅读 · 2008年12月31日

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Arxiv

0+阅读 · 2023年3月15日

Connections between Deep Equilibrium and Sparse Representation Models with Application to Hyperspectral Image Denoising

Arxiv

0+阅读 · 2023年3月14日

NIERT: Accurate Numerical Interpolation through Unifying Scattered Data Representations using Transformer Encoder

Arxiv

0+阅读 · 2023年3月14日

Enable Natural Tactile Interaction for Robot Dog based on Large-format Distributed Flexible Pressure Sensors

Arxiv

0+阅读 · 2023年3月14日

AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments

Arxiv

0+阅读 · 2023年3月13日

Spatio-Temporal Attention Network for Persistent Monitoring of Multiple Mobile Targets

Arxiv

0+阅读 · 2023年3月11日

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

Arxiv

0+阅读 · 2023年3月10日

Boosting Semi-Supervised Few-Shot Object Detection with SoftER Teacher

Arxiv

0+阅读 · 2023年3月10日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Arxiv

0+阅读 · 2023年3月15日

Connections between Deep Equilibrium and Sparse Representation Models with Application to Hyperspectral Image Denoising

Arxiv

0+阅读 · 2023年3月14日

NIERT: Accurate Numerical Interpolation through Unifying Scattered Data Representations using Transformer Encoder

Arxiv

0+阅读 · 2023年3月14日

Enable Natural Tactile Interaction for Robot Dog based on Large-format Distributed Flexible Pressure Sensors

Arxiv

0+阅读 · 2023年3月14日

AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments

Arxiv

0+阅读 · 2023年3月13日

Spatio-Temporal Attention Network for Persistent Monitoring of Multiple Mobile Targets

Arxiv

0+阅读 · 2023年3月11日

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

Arxiv

0+阅读 · 2023年3月10日

Boosting Semi-Supervised Few-Shot Object Detection with SoftER Teacher

Arxiv

0+阅读 · 2023年3月10日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

高速铁路新型“独立”供电系统建模、仿真与优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

众核处理器片内资源利用率和能效优化研究

国家自然科学基金

1+阅读 · 2012年12月31日

可压缩Navier-Stokes方程的一些数学问题

国家自然科学基金

0+阅读 · 2012年12月31日

横向约束钢管混凝土柱低周疲劳性能的研究

国家自然科学基金

0+阅读 · 2011年12月31日

有序介孔炭/聚苯胺高性能超级电容器电极材料的优化设计

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员