Sgap: 为 GPU 编集高效的散开 Tensor 代数 (Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU) - 专知论文

会员服务 ·

0

编译器 · Atom（文本编辑器） · 稀疏 · GROUP · 优化器 ·

2022 年 12 月 16 日

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

翻译：Sgap: 为 GPU 编集高效的散开 Tensor 代数

Genghan Zhang,Yuetong Zhao,Yanting Tao,Zhongming Yu,Guohao Dai,Sitao Huang,Yuan Wen,Pavlos Petoumenos,Yu Wang

from arxiv, 23 pages, 10 figures

Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can better utilize the parallel computing and memory bandwidth capacity, the central question is: how to elevate the flexible reduction semantics to sparse compilation theory that assumes serial execution. Specifically, we have to tackle two main challenges: (1) there are wasted parallelism by adopting static synchronization granularity (2) static reduction strategy limits optimization space exploration. We propose Sgap: segment group and atomic parallelism to solve these problems. Atomic parallelism captures the flexible reduction semantics to systematically analyze the optimization space of sparse-dense hybrid algebra on GPU. It is a new optimization technique beyond current compiler-based and open-source runtime libraries. Segment group elevates the flexible reduction semantics to suitable levels of abstraction in the sparse compilation theory. It adopts changeable group size and user-defined reduction strategy to solve challenge (1) and (2), respectively. Finally, we use GPU sparse matrix-matrix multiplication (SpMM) on the TACO compiler as a use case to demonstrate the effectiveness of segment group in reduction semantics elevation. We achieve up to 1.2x speedup over the original TACO's SpMM kernels. We also apply new optimization techniques found by atomic parallelism to an open-source state-of-the-art SpMM library dgSPARSE. We achieve 1.6x - 2.3x speedup on the algorithm tuned with atomic parallelism.

翻译：粗略的编译器是稀薄的高温代数优化的一个很有希望的解决方案。在编译器实施中, 减少稀薄的高温混合代数在性能中发挥着关键作用。虽然 GPU 提供了各种减少语义, 能够更好地利用平行计算和记忆带宽能力, 但中心问题是: 如何将灵活的减少语义提升到稀薄的编译理论, 假设序列执行。具体地说, 我们必须应对两大挑战:(1) 采用静态同步颗粒度, 静态削减战略, 静态同步颗粒度, 静态削减战略, 限制空间探索。我们建议 Sgap : 分块组和原子平行, 解决这些问题。原子平行主义捕捉灵活减少语义, 系统分析稀薄的混合代数代数在 GPUPS 上最优化的空间。这是一个新的优化技术, 将灵活的减少语义缩放语义提升到 IMMRA 上, 将原始的缩略图集- 缩略图的缩略图用于 IMR 的缩略图。

0

相关内容

编译器

编译器（Compiler），是一种计算机程序，它会将用某种编程语言写成的源代码（原始语言），转换成另一种编程语言（目标语言）。

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

大规模爆炸场数值模拟实时交互可视化软件

国家自然科学基金

1+阅读 · 2014年12月31日

难治性精神分裂症及其MECT治疗的脑网络特征研究

国家自然科学基金

0+阅读 · 2014年12月31日

日冕物质抛射及其驱动激波的三维演化

国家自然科学基金

0+阅读 · 2014年12月31日

磁化氢气脉冲放电的PIC/MC/DSMC模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

激光支持的脉冲等离子体推力器工质烧蚀与加速机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

全基因组甲基化CpG岛扩增技术的建立及在食管癌早期诊断中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

负氢离子产生、输运及引出的数值模拟算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于连续小波变换的海浪非线性特征研究

国家自然科学基金

0+阅读 · 2011年12月31日

重复频率半导体脉冲功率开关RSD的强场效应与关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

单分子运动学的分子动力学计算与系统生物学方法

国家自然科学基金

0+阅读 · 2008年12月31日

Towards Automated Homomorphic Encryption Parameter Selection with Fuzzy Logic and Linear Programming

Arxiv

0+阅读 · 2023年2月17日

On the Sparse DAG Structure Learning Based on Adaptive Lasso

Arxiv

0+阅读 · 2023年2月17日

A Spatial Logic for Simplicial Models

Arxiv

0+阅读 · 2023年2月16日

EvoX: A Distributed GPU-accelerated Library towards Scalable Evolutionary Computation

Arxiv

0+阅读 · 2023年2月16日

An Efficient B-tree Implementation for Memory-Constrained Embedded Systems

Arxiv

0+阅读 · 2023年2月15日

High performance implementation of 3D FEM for nonlocal Poisson problem with different ball approximation strategies

Arxiv

0+阅读 · 2023年2月15日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Efficiently Embedding Dynamic Knowledge Graphs

Efficiently Embedding Dynamic Knowledge Graphs

Arxiv

14+阅读 · 2019年10月15日

Additive Margin Softmax for Face Verification

Arxiv

11+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

Atom（文本编辑器）

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Towards Automated Homomorphic Encryption Parameter Selection with Fuzzy Logic and Linear Programming

Arxiv

0+阅读 · 2023年2月17日

On the Sparse DAG Structure Learning Based on Adaptive Lasso

Arxiv

0+阅读 · 2023年2月17日

A Spatial Logic for Simplicial Models

Arxiv

0+阅读 · 2023年2月16日

EvoX: A Distributed GPU-accelerated Library towards Scalable Evolutionary Computation

Arxiv

0+阅读 · 2023年2月16日

An Efficient B-tree Implementation for Memory-Constrained Embedded Systems

Arxiv

0+阅读 · 2023年2月15日

High performance implementation of 3D FEM for nonlocal Poisson problem with different ball approximation strategies

Arxiv

0+阅读 · 2023年2月15日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Efficiently Embedding Dynamic Knowledge Graphs

Efficiently Embedding Dynamic Knowledge Graphs

Arxiv

14+阅读 · 2019年10月15日

Additive Margin Softmax for Face Verification

Arxiv

11+阅读 · 2018年1月18日

相关基金

大规模爆炸场数值模拟实时交互可视化软件

国家自然科学基金

1+阅读 · 2014年12月31日

难治性精神分裂症及其MECT治疗的脑网络特征研究

国家自然科学基金

0+阅读 · 2014年12月31日

日冕物质抛射及其驱动激波的三维演化

国家自然科学基金

0+阅读 · 2014年12月31日

磁化氢气脉冲放电的PIC/MC/DSMC模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

激光支持的脉冲等离子体推力器工质烧蚀与加速机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

全基因组甲基化CpG岛扩增技术的建立及在食管癌早期诊断中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

负氢离子产生、输运及引出的数值模拟算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于连续小波变换的海浪非线性特征研究

国家自然科学基金

0+阅读 · 2011年12月31日

重复频率半导体脉冲功率开关RSD的强场效应与关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

单分子运动学的分子动力学计算与系统生物学方法

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员