一个数值稳定的避免通讯$s$步GMRES算法 (A numerically stable communication-avoiding s-step GMRES algorithm) - 专知论文

会员服务 ·

0

缩放 · 线性的 · 优化器 · Subspace · Extensibility ·

2023 年 3 月 17 日

A numerically stable communication-avoiding s-step GMRES algorithm

翻译：一个数值稳定的避免通讯$s$步GMRES算法

Zan Xu,Juan J. Alonso,Eric Darve

from arxiv, 32 pages, 14 figures

Krylov subspace methods are extensively used in scientific computing to solve large-scale linear systems. However, the performance of these iterative Krylov solvers on modern supercomputers is limited by expensive communication costs. The $s$-step strategy generates a series of $s$ Krylov vectors at a time to avoid communication. Asymptotically, the $s$-step approach can reduce communication latency by a factor of $s$. Unfortunately, due to finite-precision implementation, the step size has to be kept small for stability. In this work, we tackle the numerical instabilities encountered in the $s$-step GMRES algorithm. By choosing an appropriate polynomial basis and block orthogonalization schemes, we construct a communication avoiding $s$-step GMRES algorithm that automatically selects the optimal step size to ensure numerical stability. To further maximize communication savings, we introduce scaled Newton polynomials that can increase the step size $s$ to a few hundreds for many problems. An initial step size estimator is also developed to efficiently choose the optimal step size for stability. The guaranteed stability of the proposed algorithm is demonstrated using numerical experiments. In the process, we also evaluate how the choice of polynomial and preconditioning affects the stability limit of the algorithm. Finally, we show parallel scalability on more than 14,000 cores in a distributed-memory setting. Perfectly linear scaling has been observed in both strong and weak scaling studies with negligible communication costs.

翻译：Krylov子空间方法广泛应用于解决大规模线性系统的科学计算中。然而，这些迭代Krylov求解器在现代超级计算机上的性能受到昂贵的通信成本限制。$s$步策略一次生成一系列$s$个Krylov向量以避免通信。从渐近意义上讲，$s$步方法可以将通信延迟减少$s$倍。不幸的是，由于有限精度实现，必须保持步长较小以保持稳定性。在这项工作中，我们解决了$s$步GMRES算法中遇到的数值不稳定性问题。通过选择适当的多项式基础和块正交化方案，我们构造了一种避免通信的$s$步GMRES算法，该算法自动选择最佳的步长以确保数值稳定性。为了进一步最大化通信节省，我们介绍了缩放的Newton多项式，可以将步长$s$增加到数百个，适用于许多问题。还开发了一个初始步长估计器，以便高效地选择最佳步长以保持稳定性。使用数值实验证明了所提出算法的稳定性保证，并评估了多项式基础和预处理对算法稳定性限制的影响。最后，我们在分布式内存设置中的超过14,000个内核上展示并行可扩展性。在强弱扩展研究中观察到了完美的线性可扩展性，通信成本微不足道。

0

相关内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

[ICML2021]. GRAND：图神经扩散

专知会员服务

27+阅读 · 2021年7月11日

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

专知会员服务

11+阅读 · 2020年12月8日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【硬核书】稀疏多项式优化:理论与实践，220页pdf

【硬核书】稀疏多项式优化:理论与实践，220页pdf

专知

4+阅读 · 2022年9月30日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TensorFlow 2.0新特性之Ragged Tensor

TensorFlow 2.0新特性之Ragged Tensor

深度学习每日摘要

18+阅读 · 2019年4月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

用于光照上网技术的CMOS全集成可见光通信发射端系统芯片的研究与设计

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

关于二阶锥互补约束数学规划问题的约束规范和算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

流体中形状优化问题的高可扩展并行区域分解算法

国家自然科学基金

1+阅读 · 2013年12月31日

电磁场特征值问题的间断 Galerkin 算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非定常流体力学方程基于特征正交分解及自适应网格加密的外推降维数值解法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于深度学习的异构数据低维非线性表示

国家自然科学基金

1+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

非线性最小二乘问题算法及应用

国家自然科学基金

0+阅读 · 2009年12月31日

Convergence of the number of period sets in strings

Arxiv

0+阅读 · 2023年5月8日

Breaking quadrature exactness: A spectral method for the Allen--Cahn equation on spheres

Arxiv

0+阅读 · 2023年5月8日

Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

Arxiv

0+阅读 · 2023年5月8日

A mixed-categorical correlation kernel for Gaussian process

Arxiv

0+阅读 · 2023年5月6日

On High-dimensional and Low-rank Tensor Bandits

Arxiv

0+阅读 · 2023年5月6日

Over-the-Air Federated Averaging with Limited Power and Privacy Budgets

Arxiv

0+阅读 · 2023年5月5日

SAT-Inspired Higher-Order Eliminations

Arxiv

0+阅读 · 2023年5月5日

An arbitrary order Reconstructed Discontinuous Approximation to Biharmonic Interface Problem

Arxiv

0+阅读 · 2023年5月5日

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

Arxiv

0+阅读 · 2023年5月5日

Testing Convex Truncation

Arxiv

0+阅读 · 2023年5月4日

VIP会员

文章信息

相关主题

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

[ICML2021]. GRAND：图神经扩散

专知会员服务

27+阅读 · 2021年7月11日

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

专知会员服务

11+阅读 · 2020年12月8日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

【硬核书】稀疏多项式优化:理论与实践，220页pdf

【硬核书】稀疏多项式优化:理论与实践，220页pdf

专知

4+阅读 · 2022年9月30日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TensorFlow 2.0新特性之Ragged Tensor

TensorFlow 2.0新特性之Ragged Tensor

深度学习每日摘要

18+阅读 · 2019年4月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

相关论文

Convergence of the number of period sets in strings

Arxiv

0+阅读 · 2023年5月8日

Breaking quadrature exactness: A spectral method for the Allen--Cahn equation on spheres

Arxiv

0+阅读 · 2023年5月8日

Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

Arxiv

0+阅读 · 2023年5月8日

A mixed-categorical correlation kernel for Gaussian process

Arxiv

0+阅读 · 2023年5月6日

On High-dimensional and Low-rank Tensor Bandits

Arxiv

0+阅读 · 2023年5月6日

Over-the-Air Federated Averaging with Limited Power and Privacy Budgets

Arxiv

0+阅读 · 2023年5月5日

SAT-Inspired Higher-Order Eliminations

Arxiv

0+阅读 · 2023年5月5日

An arbitrary order Reconstructed Discontinuous Approximation to Biharmonic Interface Problem

Arxiv

0+阅读 · 2023年5月5日

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

Arxiv

0+阅读 · 2023年5月5日

Testing Convex Truncation

Arxiv

0+阅读 · 2023年5月4日

相关基金

用于光照上网技术的CMOS全集成可见光通信发射端系统芯片的研究与设计

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

关于二阶锥互补约束数学规划问题的约束规范和算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

流体中形状优化问题的高可扩展并行区域分解算法

国家自然科学基金

1+阅读 · 2013年12月31日

电磁场特征值问题的间断 Galerkin 算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非定常流体力学方程基于特征正交分解及自适应网格加密的外推降维数值解法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于深度学习的异构数据低维非线性表示

国家自然科学基金

1+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

非线性最小二乘问题算法及应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员