CPU/GPU 建筑结构中带有缩放数据的自动模型选择的 CPU/GPU 结构中密度和散散射数据流模模版NMF (Distributed Out-of-Memory NMF of Dense and Sparse Data on CPU/GPU Architectures with Automatic Model Selection for Exascale Data) - 专知论文

会员服务 ·

0

可约的 · 稀疏 · 模型选择 · Integration · 分解的 ·

2022 年 2 月 19 日

Distributed Out-of-Memory NMF of Dense and Sparse Data on CPU/GPU Architectures with Automatic Model Selection for Exascale Data

翻译：CPU/GPU 建筑结构中带有缩放数据的自动模型选择的 CPU/GPU 结构中密度和散散射数据流模模版NMF

Ismael Boureima,Manish Bhattarai,Maksim Eren,Erik Skau,Philip Romero,Stephan Eidenbenz,Boian Alexandrov

from arxiv, Submitted to IEEE TKDE

The need for efficient and scalable big-data analytics methods is more essential than ever due to the exploding size and complexity of globally emerging datasets. Nonnegative Matrix Factorization (NMF) is a well-known explainable unsupervised learning method for dimensionality reduction, latent feature extraction, blind source separation, data mining, and machine learning. In this paper, we introduce a new distributed out-of-memory NMF method, named pyDNMF-GPU, designed for modern heterogeneous CPU/GPU architectures that is capable of factoring exascale-sized dense and sparse matrices. Our method reduces the latency associated with local data transfer between the GPU and host using CUDA streams, and reduces the latency associated with collective communications (both intra-node and inter-node) via NCCL primitives. In addition, sparse and dense matrix multiplications are significantly accelerated with GPU cores, resulting in good scalability. We set new benchmarks for the size of the data being analyzed: in experiments, we measure up to 76x improvement on a single GPU over running on a single 18 core CPU and we show good weak scaling on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs, when decomposing a dense 340 Terabyte-size matrix and a 11 Exabyte-size sparse matrix of density 10e-6. Finally, we integrate our method with an automatic model selection method. With this integration, we introduce a new tool that is capable of analyzing, compressing, and discovering explainable latent structures in extremely large sparse and dense data.

翻译：由于全球新兴数据集的爆炸规模和复杂性,对高效和可扩展的大数据分析方法的需要比以往更加重要,原因是全球新兴数据集的爆炸规模和复杂性。非负式矩阵化(NMF)是一个众所周知的、无法监督的学习方法,用于通过NCCL原始系统减少维度、潜在地貌提取、盲源分离、数据挖掘和机器学习。在本文件中,我们引入了一种新的分布式超模型NMF方法,名为PyDNMFF-GPU, 用于现代混合的CPU/GPU结构,能够将密度密度和稀薄的基质考虑在内。我们的方法降低了GUPU和主机主之间使用 CUDA流进行本地数据传输的透明性。此外,由于GPF核心的分散和密集性变异异性,我们为正在分析的数据大小制定了新的基准:在实验中,我们测量到76x的深度,在使用CUPUI的本地化结构中,我们测量到S-258级的深度结构中,我们测量到一个在18个核心上显示一个良好的GPU值。

0

相关内容

可约的

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

面向结构化数据的向量嵌入理论 | word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

面向结构化数据的向量嵌入理论 | word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

专知会员服务

52+阅读 · 2020年4月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

多元对偶小波框架的提升构造及其在图像去噪中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高性能CPU/GPU协同并行可视化技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

CPU/GPU异构平台下并行保结构算法的研究

国家自然科学基金

2+阅读 · 2012年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

SAR图像二次成像

国家自然科学基金

5+阅读 · 2008年12月31日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Arxiv

0+阅读 · 2022年4月19日

SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning

Arxiv

0+阅读 · 2022年4月17日

Resource-Constrained Neural Architecture Search on Tabular Datasets

Arxiv

0+阅读 · 2022年4月15日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

Efficient Architecture Search for Diverse Tasks

Arxiv

0+阅读 · 2022年4月15日

Fast Sparse Decision Tree Optimization via Reference Ensembles

Arxiv

0+阅读 · 2022年4月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

IoT Solutions with Multi-Sensor Fusion and Signal-Image Encoding for Secure Data Transfer and Decision Making

Arxiv

38+阅读 · 2021年6月2日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

面向结构化数据的向量嵌入理论 | word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

面向结构化数据的向量嵌入理论 | word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

专知会员服务

52+阅读 · 2020年4月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

相关论文

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Arxiv

0+阅读 · 2022年4月19日

SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning

Arxiv

0+阅读 · 2022年4月17日

Resource-Constrained Neural Architecture Search on Tabular Datasets

Arxiv

0+阅读 · 2022年4月15日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

Efficient Architecture Search for Diverse Tasks

Arxiv

0+阅读 · 2022年4月15日

Fast Sparse Decision Tree Optimization via Reference Ensembles

Arxiv

0+阅读 · 2022年4月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

IoT Solutions with Multi-Sensor Fusion and Signal-Image Encoding for Secure Data Transfer and Decision Making

Arxiv

38+阅读 · 2021年6月2日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

相关基金

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

多元对偶小波框架的提升构造及其在图像去噪中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高性能CPU/GPU协同并行可视化技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

CPU/GPU异构平台下并行保结构算法的研究

国家自然科学基金

2+阅读 · 2012年12月31日

相关于算子的Orlicz-型函数空间的实变理论

国家自然科学基金

0+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

SAR图像二次成像

国家自然科学基金

5+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员