加速 GPUs 加速分散的近似矩阵乘法 (Accelerating Sparse Approximate Matrix Multiplication on GPUs) - 专知论文

会员服务 ·

0

Performer · 优化器 · 稀疏 · 近似 · 最优化 ·

2021 年 3 月 24 日

Accelerating Sparse Approximate Matrix Multiplication on GPUs

翻译：加速 GPUs 加速分散的近似矩阵乘法

Xiaoyan Liu,Yi Liu,Ming Dun,Bohong Yin,Hailong Yang,Zhongzhi Luan,Depei Qian

Although the matrix multiplication plays a vital role in computational linear algebra, there are few efficient solutions for matrix multiplication of the near-sparse matrices. The Sparse Approximate Matrix Multiply (SpAMM) is one of the algorithms to fill the performance gap neglected by traditional optimizations for dense/sparse matrix multiplication. However, existing SpAMM algorithms fail to exploit the performance potential of GPUs for acceleration. In this paper, we present cuSpAMM, the first parallel SpAMM algorithm optimized for multiple GPUs. Several performance optimizations have been proposed, including algorithm re-design to adapt to the thread parallelism, blocking strategies for memory access optimization, and the acceleration with the tensor core. In addition, we scale cuSpAMM to run on multiple GPUs with an effective load balance scheme. We evaluate cuSpAMM on both synthesized and real-world datasets on multiple GPUs. The experiment results show that cuSpAMM achieves significant performance speedup compared to vendor optimized cuBLAS and cuSPARSE libraries.

翻译：虽然矩阵乘法在计算线性代数中起着关键作用,但对于接近偏差矩阵的矩阵乘法,几乎没有什么有效的解决办法。粗略近距矩阵乘法( SpAMM) 是填补因传统优化而忽略的功能差距的算法之一, 用于密度/ 偏差矩阵乘法。但是, 现有的 SpAM 算法未能利用 GPU 的性能潜力加速。在本文中, 我们介绍了 CuSpAM, 这是为多个 GPU 优化的第一个平行的 SpAM 算法。已经提出了几项性能优化, 包括用于适应线性平行的算法设计、阻断存储访问优化战略, 以及加固核心加速等。此外, 我们用有效的负载平衡方案将 cuspAM 调整成多个 GPUPS 。我们评估了多个 GPUPU 的合成和真实世界数据集的 cuSpAM 。实验结果表明, CospAMM 与供应商优化的 CUBLAS 和 cuPARSE 库室相比, 取得了显著的业绩加速。

0

相关内容

Performer

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

专知会员服务

9+阅读 · 2020年6月10日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

专知会员服务

10+阅读 · 2020年5月14日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

已删除

将门创投

4+阅读 · 2017年11月1日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

DRIVE: One-bit Distributed Mean Estimation

Arxiv

0+阅读 · 2021年5月18日

Price of Precision in Coded Distributed Matrix Multiplication: A Dimensional Analysis

Arxiv

0+阅读 · 2021年5月17日

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

Arxiv

0+阅读 · 2021年5月16日

Approximate Triangle Counting via Sampling and Fast Matrix Multiplication

Arxiv

0+阅读 · 2021年5月16日

Fully Dynamic Set Cover via Hypergraph Maximal Matching: An Optimal Approximation Through a Local Approach

Arxiv

0+阅读 · 2021年5月14日

Robust Estimation of Sparse Precision Matrix using Adaptive Weighted Graphical Lasso Approach

Arxiv

0+阅读 · 2021年5月14日

Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle

Arxiv

0+阅读 · 2021年5月14日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

VIP会员

文章信息

相关主题

相关VIP内容

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

专知会员服务

9+阅读 · 2020年6月10日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

专知会员服务

10+阅读 · 2020年5月14日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

已删除

将门创投

4+阅读 · 2017年11月1日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

相关论文

DRIVE: One-bit Distributed Mean Estimation

Arxiv

0+阅读 · 2021年5月18日

Price of Precision in Coded Distributed Matrix Multiplication: A Dimensional Analysis

Arxiv

0+阅读 · 2021年5月17日

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

Arxiv

0+阅读 · 2021年5月16日

Approximate Triangle Counting via Sampling and Fast Matrix Multiplication

Arxiv

0+阅读 · 2021年5月16日

Fully Dynamic Set Cover via Hypergraph Maximal Matching: An Optimal Approximation Through a Local Approach

Arxiv

0+阅读 · 2021年5月14日

Robust Estimation of Sparse Precision Matrix using Adaptive Weighted Graphical Lasso Approach

Arxiv

0+阅读 · 2021年5月14日

Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle

Arxiv

0+阅读 · 2021年5月14日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

微信扫码咨询专知VIP会员