GPU 上的 Giga 规模的内核矩阵矢量乘法 (Giga-scale Kernel Matrix Vector Multiplication on GPU) - 专知论文

会员服务 ·

0

核矩阵 · 核化 · 向量化 · 缩放 · Extensibility ·

2022 年 2 月 2 日

Giga-scale Kernel Matrix Vector Multiplication on GPU

翻译：GPU 上的 Giga 规模的内核矩阵矢量乘法

Robert Hu,Dino Sejdinovic,Joan Alexis Glaunès

Kernel matrix vector multiplication (KMVM) is a ubiquitous operation in machine learning and scientific computing, spanning from the kernel literature to signal processing. As kernel matrix vector multiplication tends to scale quadratically in both memory and time, applications are often limited by these computational scaling constraints. We propose a novel approximation procedure coined Faster-Fast and Free Memory Method ($\text{F}^3$M) to address these scaling issues for KMVM. Extensive experiments demonstrate that $\text{F}^3$M has empirical \emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under one minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We further demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, \emph{improving speed 3-5 times} at the cost of $<$1\% drop in accuracy.

翻译：内核矩阵矢量倍增(KMVM)是机器学习和科学计算中一个无处不在的操作,从内核文献到信号处理。内核矩阵矢量倍增往往在记忆和时间上以二次方式扩大规模,应用往往受到这些计算缩放限制的限制。我们提议了一个新的近似程序,即快速快速和自由记忆方法($\text{F ⁇ 3$M),用于解决KMVM的这些缩放问题。广泛的实验表明,$\text{F ⁇ 3$M 具有经验性\emph{线性时间和记忆} 复杂性,其相对错误为 10 ⁇ -3} 美元,并且可以在高端GPU上计算出一个十亿个点 emph{在一分钟以下的全KMVM,导致与现有的CPU方法相比大幅加速。我们进一步证明我们的程序的效用,将它作为基于GPU-线性溶液的直线性求解器,提高速度3-5倍。

0

相关内容

核矩阵

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

具有弱一级马氏体相变Fe-Pd形状记忆合金的弹热效应机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非局部Schrödinger方程的高效守恒算法

国家自然科学基金

0+阅读 · 2015年12月31日

图像分割中若干图论问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

玉米株高性状发育的遗传调控网络解析

国家自然科学基金

0+阅读 · 2013年12月31日

基于核方法的非局部图像处理

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

战略联盟内组织间跨边界知识共享研究

国家自然科学基金

0+阅读 · 2008年12月31日

LIGHTYEAR: Using Modularity to Scale BGP Control Plane Verification

Arxiv

0+阅读 · 2022年4月20日

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

FedChain: Chained Algorithms for Near-Optimal Communication Cost in Federated Learning

Arxiv

0+阅读 · 2022年4月20日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Multiple-Splitting Projection Test for High-Dimensional Mean Vectors

Arxiv

0+阅读 · 2022年4月18日

Self-learning Emulators and Eigenvector Continuation

Arxiv

0+阅读 · 2022年4月17日

AID: Accuracy Improvement of Analog Discharge-Based in-SRAM Multiplication Accelerator

AID: Accuracy Improvement of Analog Discharge-Based in-SRAM Multiplication Accelerator

Arxiv

0+阅读 · 2022年4月15日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

LIGHTYEAR: Using Modularity to Scale BGP Control Plane Verification

Arxiv

0+阅读 · 2022年4月20日

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

FedChain: Chained Algorithms for Near-Optimal Communication Cost in Federated Learning

Arxiv

0+阅读 · 2022年4月20日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Multiple-Splitting Projection Test for High-Dimensional Mean Vectors

Arxiv

0+阅读 · 2022年4月18日

Self-learning Emulators and Eigenvector Continuation

Arxiv

0+阅读 · 2022年4月17日

AID: Accuracy Improvement of Analog Discharge-Based in-SRAM Multiplication Accelerator

AID: Accuracy Improvement of Analog Discharge-Based in-SRAM Multiplication Accelerator

Arxiv

0+阅读 · 2022年4月15日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

相关基金

具有弱一级马氏体相变Fe-Pd形状记忆合金的弹热效应机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非局部Schrödinger方程的高效守恒算法

国家自然科学基金

0+阅读 · 2015年12月31日

图像分割中若干图论问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

玉米株高性状发育的遗传调控网络解析

国家自然科学基金

0+阅读 · 2013年12月31日

基于核方法的非局部图像处理

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

战略联盟内组织间跨边界知识共享研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员