数字稳定的通信- 避免一步步的 GMRES 算法</s> (A numerically stable communication-avoiding s-step GMRES algorithm) - 专知论文

会员服务 ·

0

缩放 · 线性的 · 优化器 · Subspace · Extensibility ·

2023 年 3 月 15 日

A numerically stable communication-avoiding s-step GMRES algorithm

翻译：数字稳定的通信- 避免一步步的 GMRES 算法

Zan Xu,Juan J. Alonso,Eric Darve

from arxiv, 32 pages, 14 figures

Krylov subspace methods are extensively used in scientific computing to solve large-scale linear systems. However, the performance of these iterative Krylov solvers on modern supercomputers is limited by expensive communication costs. The $s$-step strategy generates a series of $s$ Krylov vectors at a time to avoid communication. Asymptotically, the $s$-step approach can reduce communication latency by a factor of $s$. Unfortunately, due to finite-precision implementation, the step size has to be kept small for stability. In this work, we tackle the numerical instabilities encountered in the $s$-step GMRES algorithm. By choosing an appropriate polynomial basis and block orthogonalization schemes, we construct a communication avoiding $s$-step GMRES algorithm that automatically selects the optimal step size to ensure numerical stability. To further maximize communication savings, we introduce scaled Newton polynomials that can increase the step size $s$ to a few hundreds for many problems. An initial step size estimator is also developed to efficiently choose the optimal step size for stability. The guaranteed stability of the proposed algorithm is demonstrated using numerical experiments. In the process, we also evaluate how the choice of polynomial and preconditioning affects the stability limit of the algorithm. Finally, we show parallel scalability on more than 14,000 cores in a distributed-memory setting. Perfectly linear scaling has been observed in both strong and weak scaling studies with negligible communication costs.

翻译：Krylov 子空间方法被广泛用于科学计算,以解决大型线性系统。然而, 这些反复的 Krylov 解决方案在现代超级计算机上的性能受到昂贵的通信成本的限制。美元分步战略产生一系列美元Krylov 矢量, 以避免通信。简便地, 美元分步法可以将通信延迟率降低以美元计。不幸的是, 由于实施有限精度, 步骤大小必须保持小小点, 才能稳定。在这项工作中, 我们解决了在美元分步GMRES算法中遇到的数字不稳定性。通过选择一个适当的多级基数基础和块或分步法计划, 我们建起一个避免美元分步制的通信算法, 自动选择最佳步骤大小的Grylov 矢量法, 以确保数字稳定。为了进一步最大限度地节省通信费用, 我们引入了规模缩小的Newton 多元多级数, 对于许多问题来说, 最初的缩放大小也是用来选择最优级级的步数大小。我们的缩缩缩缩缩缩缩的算法, 最后我们展示了一个稳定的缩缩缩的缩缩缩缩的算法, 的缩缩缩缩的缩的缩缩缩缩缩的缩的缩缩缩的缩缩的缩略图是我们的缩略的缩缩的缩的缩的缩的缩略的缩略图。</s>

0

相关内容

265页《数值线性代数基础》，密西西比大学Seongjai Kim教授最新讲义，Fundamentals of Numerical Linear Algebra

265页《数值线性代数基础》，密西西比大学Seongjai Kim教授最新讲义，Fundamentals of Numerical Linear Algebra

专知会员服务

45+阅读 · 2022年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

高采样率、高量化分辨率一体化全光模数转换关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

磁性微球固定化CA酶强化IVCAP工艺捕集CO2的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ca3Co4O9基热电材料自旋熵的物理机制及其调控

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

高电致应变介电弹性体的制备、电机性能及其偶合机理

国家自然科学基金

0+阅读 · 2011年12月31日

A位有序钙钛矿结构RBaMn2O6材料磁-电耦合效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

Improvement of selection formulas of mesh size and truncation numbers for the DE-Sinc approximation and its theoretical error bound

Arxiv

0+阅读 · 2023年5月8日

Numerical discretization of a Brinkman-Darcy-Forchheimer model under singular forcing

Arxiv

0+阅读 · 2023年5月8日

Fourier Series-Based Approximation of Time-Varying Parameters in Ordinary Differential Equations

Arxiv

0+阅读 · 2023年5月6日

On High-dimensional and Low-rank Tensor Bandits

Arxiv

0+阅读 · 2023年5月6日

Noise calibration for the stochastic rotating shallow water model

Arxiv

0+阅读 · 2023年5月5日

Marginal Inference for Hierarchical Generalized Linear Mixed Models with Patterned Covariance Matrices Using the Laplace Approximation

Arxiv

0+阅读 · 2023年5月4日

Random Shreier graphs of the general linear group over finite fields and expanders

Arxiv

0+阅读 · 2023年5月4日

An explicit algorithm for normal forms in small overlap monoids

Arxiv

0+阅读 · 2023年5月4日

Impact Study of Numerical Discretization Accuracy on Parameter Reconstructions and Model Parameter Distributions

Arxiv

0+阅读 · 2023年5月4日

A kernel-based least-squares collocation method for surface diffusion

Arxiv

0+阅读 · 2023年5月4日

VIP会员

文章信息

相关主题

相关VIP内容

265页《数值线性代数基础》，密西西比大学Seongjai Kim教授最新讲义，Fundamentals of Numerical Linear Algebra

265页《数值线性代数基础》，密西西比大学Seongjai Kim教授最新讲义，Fundamentals of Numerical Linear Algebra

专知会员服务

45+阅读 · 2022年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Improvement of selection formulas of mesh size and truncation numbers for the DE-Sinc approximation and its theoretical error bound

Arxiv

0+阅读 · 2023年5月8日

Numerical discretization of a Brinkman-Darcy-Forchheimer model under singular forcing

Arxiv

0+阅读 · 2023年5月8日

Fourier Series-Based Approximation of Time-Varying Parameters in Ordinary Differential Equations

Arxiv

0+阅读 · 2023年5月6日

On High-dimensional and Low-rank Tensor Bandits

Arxiv

0+阅读 · 2023年5月6日

Noise calibration for the stochastic rotating shallow water model

Arxiv

0+阅读 · 2023年5月5日

Marginal Inference for Hierarchical Generalized Linear Mixed Models with Patterned Covariance Matrices Using the Laplace Approximation

Arxiv

0+阅读 · 2023年5月4日

Random Shreier graphs of the general linear group over finite fields and expanders

Arxiv

0+阅读 · 2023年5月4日

An explicit algorithm for normal forms in small overlap monoids

Arxiv

0+阅读 · 2023年5月4日

Impact Study of Numerical Discretization Accuracy on Parameter Reconstructions and Model Parameter Distributions

Arxiv

0+阅读 · 2023年5月4日

A kernel-based least-squares collocation method for surface diffusion

Arxiv

0+阅读 · 2023年5月4日

相关基金

高采样率、高量化分辨率一体化全光模数转换关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

磁性微球固定化CA酶强化IVCAP工艺捕集CO2的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ca3Co4O9基热电材料自旋熵的物理机制及其调控

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

高电致应变介电弹性体的制备、电机性能及其偶合机理

国家自然科学基金

0+阅读 · 2011年12月31日

A位有序钙钛矿结构RBaMn2O6材料磁-电耦合效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员