加速以 GPU 为基础的电离电离电流电流压缩加速计算 (Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression) - 专知论文

会员服务 ·

0

查准率/准确率 · 可约的 · Processing（编程语言） · GPU · Performer ·

2021 年 9 月 12 日

Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression

翻译：加速以 GPU 为基础的电离电离电流电流压缩加速计算

Jingcheng Shen,Yifan Wu,Masao Okita,Fumihiko Ino

from arxiv, 7 pages, 7 figures, color

Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. However, the performance of the GPU-based out-of-core stencil computation is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, but the study on the use of on-the-fly compression techniques is far from sufficient. In this study, we propose a method that accelerates the GPU-based out-of-core stencil computation with on-the-fly compression. We introduce a novel data compression approach that solves the data dependency between two contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps CPU/GPU data transfer with GPU computation. Experimental results show that the proposed method achieved a speedup of 1.2x compared the method without compression. Moreover, although the precision loss involved by compression increased with the number of time steps, the precision loss was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.

翻译：Stencils 计算是一个重要的科学应用类别,可以由图形处理器(GPUs)高效率地执行。核心外方法有助于运行大型的 Stencils 代码,处理比GPU内存容量大得多的数据。然而,基于 GPU 的基于核心的外线性能计算总是受CPU和GPU之间数据传输的限制。许多优化已经探索过以减少这种数据传输,但关于使用在飞中压缩技术的研究远远不够充分。在本研究中,我们建议采用一种方法,加速基于 GPU 的核心超线计算,在飞中压缩中处理规模大于GPU 内存有限容量的数据。但我们采用了一种新的数据压缩方法,解决两个相毗连的分解式数据区块之间的数据依赖性。我们还修改了一个广泛使用的基于 GPUPU的压缩库,以支持将CPU/GPU数据传输与GPU计算相重叠的管道衬托线。实验结果显示,拟议的方法在不压缩的情况下实现了1.2x的加速。此外,尽管压缩所需的精确性耗损是时间步骤,但压缩后,压缩的精度损失。

0

相关内容

查准率/准确率

查准率/准确率

最新《图神经网络模型》概述，21页pdf

专知会员服务

137+阅读 · 2020年8月24日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

专知会员服务

15+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

已删除

将门创投

5+阅读 · 2019年10月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Cox Mixtures for Survival Regression

Arxiv

1+阅读 · 2021年11月2日

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Arxiv

0+阅读 · 2021年11月1日

Back to Basics: Efficient Network Compression via IMP

Arxiv

0+阅读 · 2021年11月1日

Efficient Training of Audio Transformers with Patchout

Arxiv

1+阅读 · 2021年10月29日

Compression of Deep Learning Models for Text: A Survey

Compression of Deep Learning Models for Text: A Survey

Arxiv

7+阅读 · 2020年8月12日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

The Knowledge Within: Methods for Data-Free Model Compression

The Knowledge Within: Methods for Data-Free Model Compression

Arxiv

3+阅读 · 2019年12月3日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

VIP会员

文章信息

相关主题

查准率/准确率

Processing（编程语言）

相关VIP内容

最新《图神经网络模型》概述，21页pdf

专知会员服务

137+阅读 · 2020年8月24日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

【O'Reilly TensorFlow Conference 2019】恶意软件检测（Generative malware outbreak detection），Sean Park | Trend Micro

专知会员服务

15+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

已删除

将门创投

5+阅读 · 2019年10月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Deep Cox Mixtures for Survival Regression

Arxiv

1+阅读 · 2021年11月2日

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Arxiv

0+阅读 · 2021年11月1日

Back to Basics: Efficient Network Compression via IMP

Arxiv

0+阅读 · 2021年11月1日

Efficient Training of Audio Transformers with Patchout

Arxiv

1+阅读 · 2021年10月29日

Compression of Deep Learning Models for Text: A Survey

Compression of Deep Learning Models for Text: A Survey

Arxiv

7+阅读 · 2020年8月12日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

The Knowledge Within: Methods for Data-Free Model Compression

The Knowledge Within: Methods for Data-Free Model Compression

Arxiv

3+阅读 · 2019年12月3日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

微信扫码咨询专知VIP会员