混合精度随机投影在张量芯片上的RandNLA (Mixed-Precision Random Projection for RandNLA on Tensor Cores) - 专知论文

会员服务 ·

0

随机投影 · 精度 · 投影 · 混合 · 随机矩阵 ·

2023 年 4 月 10 日

Mixed-Precision Random Projection for RandNLA on Tensor Cores

翻译：混合精度随机投影在张量芯片上的RandNLA

Hiroyuki Ootomo,Rio Yokota

from arxiv, PASC'23

Random projection can reduce the dimension of data while capturing its structure and is a fundamental tool for machine learning, signal processing, and information retrieval, which deal with a large amount of data today. RandNLA (Randomized Numerical Linear Algebra) leverages random projection to reduce the computational complexity of low-rank decomposition of tensors and solve least-square problems. While the computation of the random projection is a simple matrix multiplication, its asymptotic computational complexity is typically larger than other operations in a RandNLA algorithm. Therefore, various studies propose methods for reducing its computational complexity. We propose a fast mixed-precision random projection method on NVIDIA GPUs using Tensor Cores for single-precision tensors. We exploit the fact that the random matrix requires less precision, and develop a highly optimized matrix multiplication between FP32 and FP16 matrices -- SHGEMM (Single and Half-precision GEMM) -- on Tensor Cores, where the random matrix is stored in FP16. Our method can compute Randomized SVD 1.28 times faster and Random projection high order SVD 1.75 times faster than baseline single-precision implementations while maintaining accuracy.

翻译：随机投影可以在捕捉数据结构的同时减少数据的维度，是机器学习、信号处理和信息检索等领域的基础工具。RandNLA（随机数值线性代数）利用随机投影来降低张量的低秩分解和解决最小二乘问题的计算复杂度。虽然随机投影的计算是一个简单的矩阵乘法，但其渐近计算复杂性通常高于RandNLA算法中的其他操作。因此，各种研究提出了减少其计算复杂度的方法。我们提出了一种快速的混合精度随机投影方法，使用NVIDIA GPU上的Tensor Cores处理单精度张量。我们利用随机矩阵需要较低精度这一事实，并在Tensor Cores上开发了高度优化的混合精度矩阵乘法——SHGEMM（单精度和半精度GEMM），其中随机矩阵以FP16存储。我们的方法可以比基准单精度实现计算Randomized SVD快1.28倍，计算Random projection high order SVD快1.75倍，同时保持准确性。

0

相关内容

随机投影

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

428+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

由浅入深的混合精度训练教程

由浅入深的混合精度训练教程

PaperWeekly

2+阅读 · 2022年6月22日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

超高分辨率光纤地壳应变张量检测关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

数值求解分数阶偏微分方程的高精度快速算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

微光云图成像仪在轨和替代定标方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

混合有限元各向异性后验误差估计

国家自然科学基金

0+阅读 · 2012年12月31日

用高精度超导重力技术检测和研究“Hum”信号

国家自然科学基金

0+阅读 · 2012年12月31日

基于复值ICA和张量分解的完备fMRI数据分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维非直角区域上的谱和谱元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于多尺度随机场模型的高分辨率遥感影像分割方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

Optimal subsampling for large scale Elastic-net regression

Optimal subsampling for large scale Elastic-net regression

Arxiv

0+阅读 · 2023年5月29日

Many-body Approximation for Non-negative Tensors

Arxiv

0+阅读 · 2023年5月29日

On the Noise Sensitivity of the Randomized SVD

Arxiv

0+阅读 · 2023年5月27日

Error Bounds for Learning with Vector-Valued Random Features

Arxiv

0+阅读 · 2023年5月26日

Investigation of Proper Orthogonal Decomposition for Echo State Networks

Investigation of Proper Orthogonal Decomposition for Echo State Networks

Arxiv

0+阅读 · 2023年5月26日

Improved self-consistency of the Reynolds stress tensor eigenspace perturbation for Uncertainty Quantification

Arxiv

0+阅读 · 2023年5月26日

Detecting and diagnosing prior and likelihood sensitivity with power-scaling

Arxiv

0+阅读 · 2023年5月26日

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Power Allocation for Multi-Access Channel with Generalized Power Constraint

Arxiv

0+阅读 · 2023年5月25日

Deep importance sampling using tensor trains with application to a priori and a posteriori rare event estimation

Arxiv

0+阅读 · 2023年5月25日

VIP会员

文章信息

相关主题

相关VIP内容

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

428+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

由浅入深的混合精度训练教程

由浅入深的混合精度训练教程

PaperWeekly

2+阅读 · 2022年6月22日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

相关论文

Optimal subsampling for large scale Elastic-net regression

Optimal subsampling for large scale Elastic-net regression

Arxiv

0+阅读 · 2023年5月29日

Many-body Approximation for Non-negative Tensors

Arxiv

0+阅读 · 2023年5月29日

On the Noise Sensitivity of the Randomized SVD

Arxiv

0+阅读 · 2023年5月27日

Error Bounds for Learning with Vector-Valued Random Features

Arxiv

0+阅读 · 2023年5月26日

Investigation of Proper Orthogonal Decomposition for Echo State Networks

Investigation of Proper Orthogonal Decomposition for Echo State Networks

Arxiv

0+阅读 · 2023年5月26日

Improved self-consistency of the Reynolds stress tensor eigenspace perturbation for Uncertainty Quantification

Arxiv

0+阅读 · 2023年5月26日

Detecting and diagnosing prior and likelihood sensitivity with power-scaling

Arxiv

0+阅读 · 2023年5月26日

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Power Allocation for Multi-Access Channel with Generalized Power Constraint

Arxiv

0+阅读 · 2023年5月25日

Deep importance sampling using tensor trains with application to a priori and a posteriori rare event estimation

Arxiv

0+阅读 · 2023年5月25日

相关基金

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

超高分辨率光纤地壳应变张量检测关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

数值求解分数阶偏微分方程的高精度快速算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

微光云图成像仪在轨和替代定标方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

混合有限元各向异性后验误差估计

国家自然科学基金

0+阅读 · 2012年12月31日

用高精度超导重力技术检测和研究“Hum”信号

国家自然科学基金

0+阅读 · 2012年12月31日

基于复值ICA和张量分解的完备fMRI数据分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维非直角区域上的谱和谱元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于多尺度随机场模型的高分辨率遥感影像分割方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员