用于迭代线性线性求解器的AMG机床前置装置</s> (Multi-GPU aggregation-based AMG preconditioner for iterative linear solvers) - 专知论文

会员服务 ·

0

线性的 · GPU · 稀疏 · PARCO · CASES ·

2023 年 3 月 4 日

Multi-GPU aggregation-based AMG preconditioner for iterative linear solvers

翻译：用于迭代线性线性求解器的AMG机床前置装置

Massimo Bernaschi,Alessandro Celestini,Pasqua D'Ambra,Flavio Vella

We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting NVIDIA Graphics Processing Unit (GPU) accelerators. The work extends our previous efforts in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the single GPU kernels. Strong and weak scalability results on well-known benchmark test cases of the new version of the library are discussed. Comparisons with the Nvidia AmgX solution show an improvement of up to 2.0x in the solve phase.

翻译：我们以开放源格式提出并发布一个稀薄的线性求解器,它有效地利用了多种平行计算机。解答器可以很容易地纳入科学应用,这些应用需要解决由NIVIDAA图形处理股(GPU)加速器的混合节点制造的现代平行计算机上的大型和稀散线性系统。这项工作扩大了我们以前在利用单一的GPU加速器方面所作的努力,并提议根据混合的MPI-CUDA软件环境,实施Krylov型线性求解器,依靠在BoutCMatchG库中已有的高效高热量多Grid(AMG)先决条件。我们混合执行的设计是受以下最佳做法驱动的:在使用多个GPU时尽量减少数据通信的间接费用,同时保持单一的GPU内核的效率。讨论了关于新版本图书馆的著名基准测试案例的强弱可缩度和可缩度结果。与Nvidia AmgX解决方案的比较显示在解决阶段改进到2.0x。</s>

0

相关内容

线性的

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

以紫外光固化双亲共聚物为功能性软模板聚合水溶性PEDOT导电材料的研究

国家自然科学基金

0+阅读 · 2013年12月31日

多GPU并行的热/化学反应非平衡N-S方程求解算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

介尺度磁性复合囊泡状结构材料的可控构筑及性能

国家自然科学基金

0+阅读 · 2012年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

CPU/GPU异构平台下并行保结构算法的研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于两重网格的Navier-Stokes方程并行自适应后处理及变分多尺度算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

一次性量子计算

国家自然科学基金

1+阅读 · 2009年12月31日

固载铜盐/离子液体催化剂的制备、结构及性能

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems

Arxiv

0+阅读 · 2023年4月26日

Efficient Alternating Minimization Solvers for Wyner Multi-View Unsupervised Learning

Arxiv

0+阅读 · 2023年4月26日

Acceleration for Timing-Aware Gate-Level Logic Simulation with One-Pass GPU Parallelism

Arxiv

0+阅读 · 2023年4月26日

Numerical Analysis for Real-time Nonlinear Model Predictive Control of Ethanol Steam Reformers

Arxiv

0+阅读 · 2023年4月26日

Roll-Drop: accounting for observation noise with a single parameter

Arxiv

0+阅读 · 2023年4月25日

Exact recovery for the non-uniform Hypergraph Stochastic Block Model

Arxiv

0+阅读 · 2023年4月25日

On the Performance of a Novel Class of Linear System Solvers and Comparison with State-of-The-Art Algorithms

Arxiv

0+阅读 · 2023年4月24日

Parallel-in-Time Solver for the All-at-Once Runge--Kutta Discretization

Arxiv

0+阅读 · 2023年4月22日

A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs

Arxiv

0+阅读 · 2023年4月21日

Under-Approximate Reachability Analysis for a Class of Linear Systems with Inputs

Arxiv

0+阅读 · 2023年4月20日

VIP会员

文章信息

相关主题

相关VIP内容

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems

Arxiv

0+阅读 · 2023年4月26日

Efficient Alternating Minimization Solvers for Wyner Multi-View Unsupervised Learning

Arxiv

0+阅读 · 2023年4月26日

Acceleration for Timing-Aware Gate-Level Logic Simulation with One-Pass GPU Parallelism

Arxiv

0+阅读 · 2023年4月26日

Numerical Analysis for Real-time Nonlinear Model Predictive Control of Ethanol Steam Reformers

Arxiv

0+阅读 · 2023年4月26日

Roll-Drop: accounting for observation noise with a single parameter

Arxiv

0+阅读 · 2023年4月25日

Exact recovery for the non-uniform Hypergraph Stochastic Block Model

Arxiv

0+阅读 · 2023年4月25日

On the Performance of a Novel Class of Linear System Solvers and Comparison with State-of-The-Art Algorithms

Arxiv

0+阅读 · 2023年4月24日

Parallel-in-Time Solver for the All-at-Once Runge--Kutta Discretization

Arxiv

0+阅读 · 2023年4月22日

A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs

Arxiv

0+阅读 · 2023年4月21日

Under-Approximate Reachability Analysis for a Class of Linear Systems with Inputs

Arxiv

0+阅读 · 2023年4月20日

相关基金

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

以紫外光固化双亲共聚物为功能性软模板聚合水溶性PEDOT导电材料的研究

国家自然科学基金

0+阅读 · 2013年12月31日

多GPU并行的热/化学反应非平衡N-S方程求解算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

介尺度磁性复合囊泡状结构材料的可控构筑及性能

国家自然科学基金

0+阅读 · 2012年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

CPU/GPU异构平台下并行保结构算法的研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于两重网格的Navier-Stokes方程并行自适应后处理及变分多尺度算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

一次性量子计算

国家自然科学基金

1+阅读 · 2009年12月31日

固载铜盐/离子液体催化剂的制备、结构及性能

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员