高效 GPU 实施随机 SVD及其应用 (Efficient GPU implementation of randomized SVD and its applications) - 专知论文

会员服务 ·

0

奇异值分解 · Processing（编程语言） · Extensibility · 可约的 · FAST ·

2021 年 10 月 5 日

Efficient GPU implementation of randomized SVD and its applications

翻译：高效 GPU 实施随机 SVD及其应用

Łukasz Struski,Paweł Morkisz,Przemysław Spurek,Samuel Rodriguez Bernabeu,Tomasz Trzciński

Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation (https://docs.nvidia.com/cuda/cusolver/index.html).

翻译：矩阵分解在机器学习中无处不在,其中包括在维度减低、数据压缩和深层学习算法方面的应用。矩阵分解的典型解决方案具有多元复杂性,大大增加了它们的计算成本和时间。在这项工作中,我们利用高效的处理操作,在现代图形处理器(GPUs)上平行运行,这是主要计算机结构,例如用于深层学习,以减少计算矩阵分解的计算负担。更具体地说,我们重新配置随机拆解问题,将快速矩阵增殖操作(BLAS-3)作为建筑块。我们表明,这种配方加上快速随机生成器,能够充分利用在GPUs实施的平行处理的潜力。我们的广泛评价证实,这一方法优于相互竞争的方法,我们公布这一研究成果,作为正式实施CUDA(https://docs.nvidia.com/cuda/cusolver/index.html)的一部分。

0

相关内容

奇异值分解

奇异值分解

奇异值分解（Singular Value Decomposition）是线性代数中一种重要的矩阵分解，奇异值分解则是特征分解在任意矩阵上的推广。在信号处理、统计学等领域有重要应用。

FPGA加速深度学习综述

FPGA加速深度学习综述

专知会员服务

71+阅读 · 2021年11月13日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

129+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知

16+阅读 · 2021年1月10日

深度学习金融应用综述论文，52页pdf，Deep Learning for Financial Applications

深度学习金融应用综述论文，52页pdf，Deep Learning for Financial Applications

专知

7+阅读 · 2020年2月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

泡泡机器人SLAM

4+阅读 · 2018年2月11日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications

Arxiv

0+阅读 · 2021年12月1日

An Efficient Reversible Algorithm for Linear Regression

Arxiv

0+阅读 · 2021年11月30日

SamplingAug: On the Importance of Patch Sampling Augmentation for Single Image Super-Resolution

Arxiv

0+阅读 · 2021年11月30日

Computing Generalized Rank invariant for 2-Parameter Persistence Modules via Zigzag Persistence and its Applications

Arxiv

0+阅读 · 2021年11月30日

Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems

Arxiv

0+阅读 · 2021年11月29日

Low-complexity Rounded KLT Approximation for Image Compression

Arxiv

0+阅读 · 2021年11月28日

The intrinsic Toeplitz structure and its applications in algebra Riccati equations

Arxiv

0+阅读 · 2021年11月25日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

奇异值分解

Processing（编程语言）

相关VIP内容

FPGA加速深度学习综述

FPGA加速深度学习综述

专知会员服务

71+阅读 · 2021年11月13日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

129+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知

16+阅读 · 2021年1月10日

深度学习金融应用综述论文，52页pdf，Deep Learning for Financial Applications

深度学习金融应用综述论文，52页pdf，Deep Learning for Financial Applications

专知

7+阅读 · 2020年2月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

泡泡机器人SLAM

4+阅读 · 2018年2月11日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications

Arxiv

0+阅读 · 2021年12月1日

An Efficient Reversible Algorithm for Linear Regression

Arxiv

0+阅读 · 2021年11月30日

SamplingAug: On the Importance of Patch Sampling Augmentation for Single Image Super-Resolution

Arxiv

0+阅读 · 2021年11月30日

Computing Generalized Rank invariant for 2-Parameter Persistence Modules via Zigzag Persistence and its Applications

Arxiv

0+阅读 · 2021年11月30日

Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems

Arxiv

0+阅读 · 2021年11月29日

Low-complexity Rounded KLT Approximation for Image Compression

Arxiv

0+阅读 · 2021年11月28日

The intrinsic Toeplitz structure and its applications in algebra Riccati equations

Arxiv

0+阅读 · 2021年11月25日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员