用于大型高K型螺旋性小元问题的混合精度、多GPU设计 (A Mixed Precision, Multi-GPU Design for Large-scale Top-K Sparse Eigenproblems) - 专知论文

会员服务 ·

0

稀疏 · 查准率/准确率 · Processing（编程语言） · state-of-the-art · tuning ·

2022 年 1 月 19 日

A Mixed Precision, Multi-GPU Design for Large-scale Top-K Sparse Eigenproblems

翻译：用于大型高K型螺旋性小元问题的混合精度、多GPU设计

Francesco Sgherzi,Alberto Parravicini,Marco Domenico Santambrogio

Graph analytics techniques based on spectral methods process extremely large sparse matrices with millions or even billions of non-zero values. Behind these algorithms lies the Top-K sparse eigenproblem, the computation of the largest eigenvalues and their associated eigenvectors. In this work, we leverage GPUs to scale the Top-K sparse eigenproblem to bigger matrices than previously achieved while also providing state-of-the-art execution times. We can transparently partition the computation across multiple GPUs, process out-of-core matrices, and tune precision and execution time using mixed-precision floating-point arithmetic. Overall, we are 67 times faster than the highly optimized ARPACK library running on a 104-thread CPU and 1.9 times than a recent FPGA hardware design. We also determine how mixed-precision floating-point arithmetic improves execution time by 50% over double-precision, and is 12 times more accurate than single-precision floating-point arithmetic.

翻译：基于光谱方法的图形分析技术, 以光谱方法处理极其稀少的矩阵, 其值为百万甚至数十亿非零值。这些算法背后的算法是最高- K 稀疏的半基因问题, 计算最大的电子元值及其相关的源数。在这项工作中, 我们利用 GPUs 将高- K 稀疏的源数放大为比以前更大的矩阵, 同时提供最先进的执行时间。我们可以透明地将计算方法分成多个 GPUs、进程超出核心矩阵、调整精度和执行时间, 使用混合精度浮点算法。总的来说, 我们比高度优化的 ARPACK 图书馆在104 行读的 CPU 上运行的速度快67倍, 比最近的 FPGA 硬件设计速度快1. 9 倍。我们还确定混合精度浮点算法如何将执行时间比双精度增加50%, 并且比单精度浮点算法要精确12倍。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

图挖掘与多关系学习，亚马逊与CMU-WWW2021教程，附161页ppt

专知会员服务

37+阅读 · 2021年4月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

MEMS声衬及分布式阻抗控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

多GPU并行的热/化学反应非平衡N-S方程求解算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的并行不变特征图像匹配技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

网络化动态耦合系统中分布式模型预测控制的协同优化

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU/CPU协同计算的城市建筑群震害模拟

国家自然科学基金

0+阅读 · 2011年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于NURBS曲面的弹跳射线法的GPU加速

国家自然科学基金

0+阅读 · 2008年12月31日

State machines for large scale computer software and systems

Arxiv

0+阅读 · 2022年4月19日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

Self-learning Emulators and Eigenvector Continuation

Arxiv

0+阅读 · 2022年4月17日

An Efficient Approximation Algorithm for the Colonel Blotto Game

Arxiv

0+阅读 · 2022年4月16日

Wake Up and Join Me! An Energy-Efficient Algorithm for Maximal Matching in Radio Networks

Arxiv

0+阅读 · 2022年4月16日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Arxiv

16+阅读 · 2021年2月20日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

VIP会员

文章信息

相关主题

查准率/准确率

Processing（编程语言）

state-of-the-art

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

图挖掘与多关系学习，亚马逊与CMU-WWW2021教程，附161页ppt

专知会员服务

37+阅读 · 2021年4月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

State machines for large scale computer software and systems

Arxiv

0+阅读 · 2022年4月19日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

Self-learning Emulators and Eigenvector Continuation

Arxiv

0+阅读 · 2022年4月17日

An Efficient Approximation Algorithm for the Colonel Blotto Game

Arxiv

0+阅读 · 2022年4月16日

Wake Up and Join Me! An Energy-Efficient Algorithm for Maximal Matching in Radio Networks

Arxiv

0+阅读 · 2022年4月16日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Arxiv

16+阅读 · 2021年2月20日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

相关基金

MEMS声衬及分布式阻抗控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

多GPU并行的热/化学反应非平衡N-S方程求解算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的并行不变特征图像匹配技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

网络化动态耦合系统中分布式模型预测控制的协同优化

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU/CPU协同计算的城市建筑群震害模拟

国家自然科学基金

0+阅读 · 2011年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于NURBS曲面的弹跳射线法的GPU加速

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员