线性代数中枢平行 I/ O 最佳线性代数中枢: 近最佳矩阵乘数 (On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations) - 专知论文

会员服务 ·

0

分解的 · state-of-the-art · 线性的 · 优化器 · Less ·

2021 年 8 月 20 日

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

翻译：线性代数中枢平行 I/ O 最佳线性代数中枢: 近最佳矩阵乘数

Grzegorz Kwasniewski,Marko Kabić,Tal Ben-Nun,Alexandros Nikolaos Ziogas,Jens Eirik Saethre,André Gaillard,Timo Schneider,Maciej Besta,Anton Kozhevnikov,Joost VandeVondele,Torsten Hoefler

from arxiv, 15 pages (including references), 11 figures. arXiv admin note: substantial text overlap with arXiv:2010.05975

Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^3/(P*sqrt(M)) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 262,144 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPACK-compatible and available as an open-source library.

翻译：矩阵分解是科学计算最重要的基石之一。然而, 最先进的图书馆并不是通信最理想的, 无法充分利用当前平行结构。我们为Cholesky 和 LU 的分解提供了新型算法。我们首先建立了一个理论框架, 用于计算线性代数内核的平行 I/ O 下限, 然后利用它的洞察力得出Cholesky 和 LU 的进度表, 两者在M 是本地记忆大小的每个处理器中传递 NQ3/ (P*sqrt(M)) 元素。实验结果符合我们的理论分析 : 我们的执行远远低于 Intel MKL、 SLATE 和 Astytop- 通信最佳 CANDMC 和 CAPITAL 图书馆。我们的代码在几乎所有测试的假想中都超越了这些状态的图书馆, 其矩阵大小从 2, 048 到 262, 144 到 Piz Daint 高级图书馆的512 节节点, 和我们的高级图书馆的代码将逐渐缩小为开放时间。

0

相关内容

分解的

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】线性代数，Linear Algebra，525页pdf

【经典书】线性代数，Linear Algebra，525页pdf

专知会员服务

78+阅读 · 2021年1月29日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Compressive Independent Component Analysis: Theory and Algorithms

Compressive Independent Component Analysis: Theory and Algorithms

Arxiv

0+阅读 · 2021年10月15日

Optimal Decision Trees for Nonlinear Metrics

Arxiv

0+阅读 · 2021年10月15日

Classical symmetries and the Quantum Approximate Optimization Algorithm

Arxiv

0+阅读 · 2021年10月8日

Low-Rank Sinkhorn Factorization

Arxiv

9+阅读 · 2021年3月8日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Gaussian Error Linear Units (GELUs)

Arxiv

3+阅读 · 2018年11月11日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Arxiv

17+阅读 · 2017年12月12日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】线性代数，Linear Algebra，525页pdf

【经典书】线性代数，Linear Algebra，525页pdf

专知会员服务

78+阅读 · 2021年1月29日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向无人机集群的避障动态传感器覆盖算法》最新38页

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

相关论文

Compressive Independent Component Analysis: Theory and Algorithms

Compressive Independent Component Analysis: Theory and Algorithms

Arxiv

0+阅读 · 2021年10月15日

Optimal Decision Trees for Nonlinear Metrics

Arxiv

0+阅读 · 2021年10月15日

Classical symmetries and the Quantum Approximate Optimization Algorithm

Arxiv

0+阅读 · 2021年10月8日

Low-Rank Sinkhorn Factorization

Arxiv

9+阅读 · 2021年3月8日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Gaussian Error Linear Units (GELUs)

Arxiv

3+阅读 · 2018年11月11日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Arxiv

17+阅读 · 2017年12月12日

微信扫码咨询专知VIP会员