Lightning: 将 GPU 编程模型缩小到单 GPU 范围 (Lightning: Scaling the GPU Programming Model Beyond a Single GPU) - 专知论文

会员服务 ·

0

GPU · 记忆容量 · 缩放 · 核化 · MoDELS ·

2022 年 2 月 11 日

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

翻译：Lightning: 将 GPU 编程模型缩小到单 GPU 范围

Stijn Heldens,Pieter Hijma,Ben van Werkhoven,Jason Maassen,Rob. V. van Nieuwpoort

from arxiv, To be published at 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS)

The GPU programming model is primarily designed to support the development of applications that run on one GPU. However, just a single GPU is limited in its capabilities in terms of memory capacity and compute power. To handle large problems that exceed these capabilities, one must rewrite application code to manually transfer data between GPU memory and higher-level memory and/or distribute the work across multiple GPUs, possibly in multiple nodes. This means a large engineering effort is required to scale GPU applications beyond a single GPU. We present Lightning: a framework that follows the common GPU programming paradigm, but enables scaling to larger problems. Lightning enables multi-GPU execution of GPU kernels, even across multiple nodes, and seamlessly spills data to main memory and disk when required. Existing CUDA kernels can easily be adapted for use in Lightning, with data access annotations on these kernels allowing Lightning to infer their data requirements and dependencies. Lightning efficiently distributes the work/data across GPUs and maximizes efficiency by overlapping scheduling, data movement, and work when possible. We present the design and implementation of Lightning, as well as experimental results on up to 32 GPUs for eight benchmarks and an application from geospatial clustering. Evaluation shows excellent performance on problem sizes that far exceed the memory capacity of a single GPU.

翻译：GPU 编程模式主要是为了支持一个 GPU 上运行的应用开发。但是, 只有一个 GPU 在存储能力和计算能力方面能力有限。要处理超过这些能力的大问题, 就必须重写应用程序代码, 以便在 GPU 记忆和更高级记忆之间手工传输数据, 并/ 或者在多个 GPU 中分配工作, 可能的话在多个节点中进行。这意味着需要做出巨大的工程努力, 将 GPU 应用程序扩大到一个 GPU 。我们展示 Lightning: 一个遵循通用 GPU 编程模式的框架, 并且能够将工作扩大到更大的问题。点亮能够让多GPU执行 GPU 内核, 甚至在多个节点之间执行, 必要时将数据无缝地溢出到主记忆和磁盘。现有的 CUDA 内核可以很容易适应在 Lightning 中使用, 数据访问说明允许 Lighting 将 GPUI 的工作/ data, 通过重叠的时间安排、数据移动和工作实现效率最大化。我们展示了最优的G GM 的G 运行,, 将设计和最优的G 的G, 运行运行的的运行运行运行运行的运行运行的运行运行运行的运行的运行运行运行的的的运行运行运行运行运行运行运行运行运行运行运行运行运行运行运行。

1

相关内容

GPU

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

微网安全风险评估研究

国家自然科学基金

1+阅读 · 2014年12月31日

南海深层西边界流的观测与模拟

国家自然科学基金

0+阅读 · 2014年12月31日

基于GPU的CSAMT三维正演的并行外推多网格法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

垄断及双寡头市场条件下企业级软件交付模式的研究

国家自然科学基金

2+阅读 · 2013年12月31日

不确定条件下移动设施路径问题的时空优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

GPU通用计算系统检查点方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于NURBS曲面的弹跳射线法的GPU加速

国家自然科学基金

0+阅读 · 2008年12月31日

A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning

Arxiv

0+阅读 · 2022年4月20日

Breaching the 2-Approximation Barrier for the Forest Augmentation Problem

Arxiv

0+阅读 · 2022年4月20日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

Arxiv

0+阅读 · 2022年4月19日

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Arxiv

0+阅读 · 2022年4月19日

"Flux+Mutability": A Conditional Generative Approach to One-Class Classification and Anomaly Detection

Arxiv

0+阅读 · 2022年4月19日

LwHBench: A low-level hardware component benchmark and dataset for Single Board Computers

Arxiv

0+阅读 · 2022年4月18日

Suffix tree-based linear algorithms for multiple prefixes, single suffix counting and listing problems

Suffix tree-based linear algorithms for multiple prefixes, single suffix counting and listing problems

Arxiv

0+阅读 · 2022年4月18日

Beyond L1: Faster and Better Sparse Models with skglm

Arxiv

0+阅读 · 2022年4月16日

A Variational Approach to Bayesian Phylogenetic Inference

Arxiv

0+阅读 · 2022年4月16日

VIP会员

文章信息

相关主题

相关VIP内容

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning

Arxiv

0+阅读 · 2022年4月20日

Breaching the 2-Approximation Barrier for the Forest Augmentation Problem

Arxiv

0+阅读 · 2022年4月20日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

Arxiv

0+阅读 · 2022年4月19日

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Arxiv

0+阅读 · 2022年4月19日

"Flux+Mutability": A Conditional Generative Approach to One-Class Classification and Anomaly Detection

Arxiv

0+阅读 · 2022年4月19日

LwHBench: A low-level hardware component benchmark and dataset for Single Board Computers

Arxiv

0+阅读 · 2022年4月18日

Suffix tree-based linear algorithms for multiple prefixes, single suffix counting and listing problems

Suffix tree-based linear algorithms for multiple prefixes, single suffix counting and listing problems

Arxiv

0+阅读 · 2022年4月18日

Beyond L1: Faster and Better Sparse Models with skglm

Arxiv

0+阅读 · 2022年4月16日

A Variational Approach to Bayesian Phylogenetic Inference

Arxiv

0+阅读 · 2022年4月16日

相关基金

微网安全风险评估研究

国家自然科学基金

1+阅读 · 2014年12月31日

南海深层西边界流的观测与模拟

国家自然科学基金

0+阅读 · 2014年12月31日

基于GPU的CSAMT三维正演的并行外推多网格法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

垄断及双寡头市场条件下企业级软件交付模式的研究

国家自然科学基金

2+阅读 · 2013年12月31日

不确定条件下移动设施路径问题的时空优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

GPU通用计算系统检查点方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于NURBS曲面的弹跳射线法的GPU加速

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员