以斯特拉斯森为基础的矩阵通过其转换器的可有效平行乘法 (Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose) - 专知论文

会员服务 ·

0

转置 · Extensibility · Performer · state-of-the-art · 计算成本 ·

2021 年 10 月 25 日

Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose

翻译：以斯特拉斯森为基础的矩阵通过其转换器的可有效平行乘法

Viviana Arrigoni,Filippo Maggioli,Annalisa Massini,Emanuele Rodolà

The multiplication of a matrix by its transpose, $A^T A$, appears as an intermediate operation in the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm (ATA) for computing this product, based upon the classical Strassen algorithm as a sub-routine. In particular, we decrease the computational cost to $\frac{2}{3}$ the time required by Strassen's algorithm, amounting to $\frac{14}{3}n^{\log_2 7}$ floating point operations. ATA works for generic rectangular matrices, and exploits the peculiar symmetry of the resulting product matrix for saving memory. In addition, we provide an extensive implementation study of ATA in a shared memory system, and extend its applicability to a distributed environment. To support our findings, we compare our algorithm with state-of-the-art solutions specialized in the computation of $A^T A$. Our experiments highlight good scalability with respect to both the matrix size and the number of involved processes, as well as favorable performance for both the parallel paradigms and the sequential implementation, when compared with other methods in the literature.

翻译：变换后的矩阵乘法( $A_T A$) 似乎是解决一系列广泛问题的一种中间操作。在本文中, 我们基于古典Strassen 算法作为子常规, 提议一种新的缓存隐蔽算法(ATA) 来计算这一产品。特别是, 我们将计算成本降低到 $frac{ 2 ⁇ 3} 美元, 也就是Strassen 算法所需要的时间, 即$frac{14}3}n ⁇ log_ 2 7} 浮动点操作所需的时间。 ATA 工作为通用矩形矩阵, 并开发由此产生的产品矩阵的特殊对称以保存记忆。此外, 我们还在共享记忆系统中对ATA进行广泛的执行研究, 并将其应用范围扩大到分布式环境。为了支持我们的发现, 我们比较我们的算法与在计算 $A_T A$中专门使用的最新解决方案。我们的实验显示, 在矩阵大小和所涉进程数量方面, 以及与其他文献的平行模式和连续执行方法相比, 在平行模式和连续执行中, 我们的绩效表现良好。

0

相关内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【IJCAI2020】TransOMCS: 从语言图谱到常识图谱

【IJCAI2020】TransOMCS: 从语言图谱到常识图谱

专知会员服务

35+阅读 · 2020年5月4日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Unique sparse decomposition of low rank matrices

Arxiv

0+阅读 · 2021年12月21日

One-way or Two-way Factor Model for Matrix Sequences?

Arxiv

0+阅读 · 2021年12月19日

Convergence on a symmetric accelerated stochastic ADMM with larger stepsizes

Arxiv

0+阅读 · 2021年12月19日

Diversity/Parallelism Trade-off in Distributed Systems with Redundancy

Arxiv

0+阅读 · 2021年12月18日

A Mathematical Negotiation Mechanism for Distributed Procurement Problems and a Hybrid Algorithm for its Solution

Arxiv

0+阅读 · 2021年12月18日

Community-based Layerwise Distributed Training of Graph Convolutional Networks

Arxiv

0+阅读 · 2021年12月17日

Multiplying Matrices Without Multiplying

Arxiv

9+阅读 · 2021年6月21日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【IJCAI2020】TransOMCS: 从语言图谱到常识图谱

【IJCAI2020】TransOMCS: 从语言图谱到常识图谱

专知会员服务

35+阅读 · 2020年5月4日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

相关论文

Unique sparse decomposition of low rank matrices

Arxiv

0+阅读 · 2021年12月21日

One-way or Two-way Factor Model for Matrix Sequences?

Arxiv

0+阅读 · 2021年12月19日

Convergence on a symmetric accelerated stochastic ADMM with larger stepsizes

Arxiv

0+阅读 · 2021年12月19日

Diversity/Parallelism Trade-off in Distributed Systems with Redundancy

Arxiv

0+阅读 · 2021年12月18日

A Mathematical Negotiation Mechanism for Distributed Procurement Problems and a Hybrid Algorithm for its Solution

Arxiv

0+阅读 · 2021年12月18日

Community-based Layerwise Distributed Training of Graph Convolutional Networks

Arxiv

0+阅读 · 2021年12月17日

Multiplying Matrices Without Multiplying

Arxiv

9+阅读 · 2021年6月21日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员