通过大规模Lagrangian分解利用可缩放的非线性编程的 GPU 批量利用 GPU 集 (Leveraging GPU batching for scalable nonlinear programming through massive Lagrangian decomposition) - 专知论文

会员服务 ·

0

非线性规划 · Performer · GPU · 核函数 · Batch Size ·

2021 年 6 月 28 日

Leveraging GPU batching for scalable nonlinear programming through massive Lagrangian decomposition

翻译：通过大规模Lagrangian分解利用可缩放的非线性编程的 GPU 批量利用 GPU 集

Youngdae Kim,François Pacaud,Kibaek Kim,Mihai Anitescu

We present the implementation of a trust-region Newton algorithm ExaTron for bound-constrained nonlinear programming problems, fully running on multiple GPUs. Without data transfers between CPU and GPU, our implementation has achieved the elimination of a major performance bottleneck under a memory-bound situation, particularly when solving many small problems in batch. We discuss the design principles and implementation details for our kernel function and core operations. Different design choices are justified by numerical experiments. By using the application of distributed control of alternating current optimal power flow, where a large problem is decomposed into many smaller nonlinear programs using a Lagrangian approach, we demonstrate computational performance of ExaTron on the Summit supercomputer at Oak RidgeNational Laboratory. Our numerical results show the linear scaling with respect to the batch size and the number of GPUs and more than 35 times speedup on 6 GPUs than on 40 CPUs available on a single node.

翻译：我们介绍了对受约束的非线性编程问题实施信任区的牛顿算法ExaTron, 完全在多个 GPU 上运行。没有数据在CPU 和 GPU 之间传输, 我们的实施工作已经消除了记忆内存情况下的主要性能瓶颈, 特别是在解决许多小批量问题时。我们讨论了我们内核功能和核心操作的设计原则和实施细节。不同的设计选择是用数字实验来证明的。通过应用对交替当前最佳电流的分散控制, 将一个大问题分解成许多较小的非线性程序, 我们在橡树脊国家实验室的顶顶级超级计算机上展示ExaTRon的计算性能。我们的数字结果显示了与批量大小和GPU的数量有关的线性缩放, 6 GPU 的加速度比单一节点上40个 CPU的加速度超过35倍。

0

相关内容

非线性规划

非线性规划

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

系列教程GNN-algorithms之七：《图同构网络—GIN》

系列教程GNN-algorithms之七：《图同构网络—GIN》

专知会员服务

48+阅读 · 2020年8月9日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

自动结构变分推理，Automatic structured variational inference

自动结构变分推理，Automatic structured variational inference

专知会员服务

41+阅读 · 2020年2月10日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

给DNN处理器跑个分 - 指标篇

给DNN处理器跑个分 - 指标篇

StarryHeavensAbove

5+阅读 · 2017年7月9日

大数据的分布式算法

大数据的分布式算法

待字闺中

3+阅读 · 2017年6月13日

Convergence of a Lagrangian-Eulerian scheme by a weak asymptotic analysis for one-dimensional hyperbolic problems

Arxiv

0+阅读 · 2021年8月31日

A dynamic programming approach for generalized nearly isotonic optimization

Arxiv

0+阅读 · 2021年8月31日

DeepOPF: A Feasibility-Optimized Deep Neural Network Approach for AC Optimal Power Flow Problems

Arxiv

0+阅读 · 2021年8月31日

AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

Arxiv

0+阅读 · 2021年8月31日

An Efficient Finite Element Iterative Method for Solving a Nonuniform Size Modified Poisson-Boltzmann Ion Channel Model

Arxiv

0+阅读 · 2021年8月31日

A New Lever Function with Adequate Indeterminacy

Arxiv

0+阅读 · 2021年8月30日

A doubly relaxed minimal-norm Gauss-Newton method for underdetermined nonlinear least-squares problems

Arxiv

0+阅读 · 2021年8月29日

High accuracy power series method for solving scalar, vector, and inhomogeneous nonlinear Schrödinger equations

Arxiv

0+阅读 · 2021年8月18日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

VIP会员

文章信息

相关主题

非线性规划

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

系列教程GNN-algorithms之七：《图同构网络—GIN》

系列教程GNN-algorithms之七：《图同构网络—GIN》

专知会员服务

48+阅读 · 2020年8月9日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

自动结构变分推理，Automatic structured variational inference

自动结构变分推理，Automatic structured variational inference

专知会员服务

41+阅读 · 2020年2月10日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

给DNN处理器跑个分 - 指标篇

给DNN处理器跑个分 - 指标篇

StarryHeavensAbove

5+阅读 · 2017年7月9日

大数据的分布式算法

大数据的分布式算法

待字闺中

3+阅读 · 2017年6月13日

相关论文

Convergence of a Lagrangian-Eulerian scheme by a weak asymptotic analysis for one-dimensional hyperbolic problems

Arxiv

0+阅读 · 2021年8月31日

A dynamic programming approach for generalized nearly isotonic optimization

Arxiv

0+阅读 · 2021年8月31日

DeepOPF: A Feasibility-Optimized Deep Neural Network Approach for AC Optimal Power Flow Problems

Arxiv

0+阅读 · 2021年8月31日

AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

Arxiv

0+阅读 · 2021年8月31日

An Efficient Finite Element Iterative Method for Solving a Nonuniform Size Modified Poisson-Boltzmann Ion Channel Model

Arxiv

0+阅读 · 2021年8月31日

A New Lever Function with Adequate Indeterminacy

Arxiv

0+阅读 · 2021年8月30日

A doubly relaxed minimal-norm Gauss-Newton method for underdetermined nonlinear least-squares problems

Arxiv

0+阅读 · 2021年8月29日

High accuracy power series method for solving scalar, vector, and inhomogeneous nonlinear Schrödinger equations

Arxiv

0+阅读 · 2021年8月18日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

微信扫码咨询专知VIP会员