和谐:克服GPU记忆能力的障碍,在商品服务器上培训大规模DNN模型 (Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers) - 专知论文

会员服务 ·

0

记忆容量 · Harmony · DNN · MoDELS · GPU ·

2022 年 2 月 2 日

Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers

翻译：和谐:克服GPU记忆能力的障碍,在商品服务器上培训大规模DNN模型

Youjie Li,Amar Phanishayee,Derek Murray,Jakub Tarnawski,Nam Sung Kim

Deep neural networks (DNNs) have grown exponentially in complexity and size over the past decade, leaving only those who have access to massive datacenter-based resources with the ability to develop and train such models. One of the main challenges for the long tail of researchers who might have access to only limited resources (e.g., a single multi-GPU server) is limited GPU memory capacity compared to model size. The problem is so acute that the memory requirement of training large DNN models can often exceed the aggregate capacity of all available GPUs on commodity servers; this problem only gets worse with the trend of ever-growing model sizes. Current solutions that rely on virtualizing GPU memory (by swapping to/from CPU memory) incur excessive swapping overhead. In this paper, we present a new training framework, Harmony, and advocate rethinking how DNN frameworks schedule computation and move data to push the boundaries of training large models efficiently on modest multi-GPU deployments. Across many large DNN models, Harmony is able to reduce swap load by up to two orders of magnitude and obtain a training throughput speedup of up to 7.6x over highly optimized baselines with virtualized memory.

翻译：过去十年来,深神经网络(DNN)的复杂程度和规模都成倍增长,只有那些能够获得大量基于数据中心的资源的人才有能力开发和培训这些模型。对于可能只获得有限资源(例如,单一的多GPU服务器)的研究人员来说,长期尾巴的主要挑战之一是,与模型大小相比,GPU的记忆能力有限。问题如此尖锐,培训大型DNN模型的记忆要求往往超过商品服务器上所有可用的GPU总容量;随着模型规模不断增长的趋势,这一问题只会变得更加严重。目前依靠虚拟化GPU记忆(通过交换/从CPU记忆中交换)的解决方案造成了过度的顶级转换。在本文中,我们提出了一个新的培训框架“和谐”并倡导重新思考DNNF框架如何计算和移动数据,以便在小型多CPU部署上有效推进大型模型培训的界限。在许多大型DNNM模型中,和谐能够将交换工作量减少至两个数量级,并获得最高至7.6x的顶峰速培训。

0

相关内容

记忆容量

最新《神经架构搜索NAS》教程，33页pdf

最新《神经架构搜索NAS》教程，33页pdf

专知会员服务

27+阅读 · 2020年12月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

动态环境下的实时高清大规模三维地形重建研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于CPU/GPU异构协同的并行离散事件仿真关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向大规模移动传感器网络的路由智能容错方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于数据分布评估和支持向量机方法的分布式数据流挖掘模型和算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于TSV互连的三维FPGA架构及关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

异构GPU集群混合粒度任务协同调度与动态均衡机制研究

国家自然科学基金

2+阅读 · 2012年12月31日

云计算环境下海量遥感数据的节能存储研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多主体复杂系统的电力智能交易演化理论及动态均衡特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时多模粒子PHD滤波器算法与硬件实现研究

国家自然科学基金

0+阅读 · 2011年12月31日

Overlay结构特性对网络攻击的影响的仿真分析

国家自然科学基金

0+阅读 · 2010年12月31日

Physics-Informed Quantum Communication Networks: A Vision Towards the Quantum Internet

Arxiv

0+阅读 · 2022年4月20日

Decentralized Control of Distributed Cloud Networks with Generalized Network Flows

Arxiv

0+阅读 · 2022年4月19日

Per-clip and per-bitrate adaptation of the Lagrangian multiplier in video coding

Arxiv

0+阅读 · 2022年4月19日

Processing Analytical Queries in the AWESOME Polystore [Information Systems Architectures]

Arxiv

0+阅读 · 2022年4月18日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Dynamic Approximate Maximum Independent Set on Massive Graphs

Arxiv

0+阅读 · 2022年4月18日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

相关VIP内容

最新《神经架构搜索NAS》教程，33页pdf

最新《神经架构搜索NAS》教程，33页pdf

专知会员服务

27+阅读 · 2020年12月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Physics-Informed Quantum Communication Networks: A Vision Towards the Quantum Internet

Arxiv

0+阅读 · 2022年4月20日

Decentralized Control of Distributed Cloud Networks with Generalized Network Flows

Arxiv

0+阅读 · 2022年4月19日

Per-clip and per-bitrate adaptation of the Lagrangian multiplier in video coding

Arxiv

0+阅读 · 2022年4月19日

Processing Analytical Queries in the AWESOME Polystore [Information Systems Architectures]

Arxiv

0+阅读 · 2022年4月18日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Dynamic Approximate Maximum Independent Set on Massive Graphs

Arxiv

0+阅读 · 2022年4月18日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

动态环境下的实时高清大规模三维地形重建研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于CPU/GPU异构协同的并行离散事件仿真关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向大规模移动传感器网络的路由智能容错方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于数据分布评估和支持向量机方法的分布式数据流挖掘模型和算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于TSV互连的三维FPGA架构及关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

异构GPU集群混合粒度任务协同调度与动态均衡机制研究

国家自然科学基金

2+阅读 · 2012年12月31日

云计算环境下海量遥感数据的节能存储研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多主体复杂系统的电力智能交易演化理论及动态均衡特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时多模粒子PHD滤波器算法与硬件实现研究

国家自然科学基金

0+阅读 · 2011年12月31日

Overlay结构特性对网络攻击的影响的仿真分析

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员