打破分布式机器学习工作量的计算和通信的计算和通信抽象障碍 (Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads) - 专知论文

会员服务 ·

0

优化器 · 分布式机器学习 · Machine Learning · 学成 · Performer ·

2021 年 11 月 16 日

Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

翻译：打破分布式机器学习工作量的计算和通信的计算和通信抽象障碍

Abhinav Jangda,Jun Huang,Guodong Liu,Amir Hossein Nodehi Sabet,Saeed Maleki,Youshan Miao,Madanlal Musuvathi,Todd Mytkowicz,Olli Sarikivi

Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, current logical separation between computation and communication kernels in deep learning frameworks misses the optimization opportunities across such barrier. Breaking this abstraction with a holistic consideration can provide many optimizations to provide performance improvements in distributed workloads. Manually applying these optimizations needs modifications in underlying computation and communication libraries for each scenario, which is time consuming and error-prone. Therefore, we present CoCoNeT, with a DSL to express a program with both computation and communication. CoCoNeT contains several machine learning aware transformations to optimize a program and a compiler to generate high performance kernels. Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication and computation. CoCoNeT enables us to optimize data-, model-and pipeline-parallel workloads in large language models with only a few lines of code. Experiments show CoCoNeT significantly outperforms state-of-the-art distributed machine learning implementations.

翻译：考虑到培训这些模型的庞大成本,我们必须在计算和通信中解开优化的优化,以取得最佳业绩。然而,深学习框架中计算和通信核心之间的当前逻辑分解却错过了这种障碍的优化机会。将这一抽取与整体考虑分开,可以提供许多优化,以提高分布式工作量的绩效。手工应用这些优化需要修改每个情景的基本计算和通信库,因为每个情景既耗时又容易出错。因此,我们向COoNET提交一个DSL,以表达一个与计算和通信有关的程序。CoNET包含一些机器学习意识转换,以优化一个程序,而编译者则生成高性能核心。将计算和通信作为一流结构提供,使用户能够进行高层次的抽取工作,并应用强大的优化,例如通信和计算的聚合或重叠。CoNeT使我们能够在大型语言模型中优化数据、模型和管道工作量,只有几行版本的代码。CoNeuringS-CON-CoPOLPER 外出大型语言模型,只有几行的学习模式。

0

相关内容

优化器

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

51+阅读 · 2019年12月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Flexible Parallel Learning in Edge Scenarios: Communication, Computational and Energy Cost

Arxiv

0+阅读 · 2022年1月19日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Learning by Abstraction: The Neural State Machine

Learning by Abstraction: The Neural State Machine

Arxiv

6+阅读 · 2019年7月11日

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Arxiv

14+阅读 · 2019年1月17日

Dynamic Control Flow in Large-Scale Machine Learning

Arxiv

3+阅读 · 2018年5月4日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

VIP会员

文章信息

相关主题

分布式机器学习

Machine Learning

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

51+阅读 · 2019年12月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Flexible Parallel Learning in Edge Scenarios: Communication, Computational and Energy Cost

Arxiv

0+阅读 · 2022年1月19日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Learning by Abstraction: The Neural State Machine

Learning by Abstraction: The Neural State Machine

Arxiv

6+阅读 · 2019年7月11日

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Arxiv

14+阅读 · 2019年1月17日

Dynamic Control Flow in Large-Scale Machine Learning

Arxiv

3+阅读 · 2018年5月4日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

微信扫码咨询专知VIP会员