Sgap: 为 GPU 编集高效的散开 Tensor 代数 (Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU) - 专知论文

会员服务 ·

0

Tensor · 稀疏 · Atom（文本编辑器） · 编译器 · 优化器 ·

2022 年 9 月 7 日

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

翻译：Sgap: 为 GPU 编集高效的散开 Tensor 代数

Genghan Zhang,Yuetong Zhao,Yanting Tao,Zhongming Yu,Guohao Dai,Sitao Huang,Yuan Wen,Pavlos Petoumenos,Yu Wang

from arxiv, 23 pages, 10 figures

Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can better utilize the parallel computing and memory bandwidth capacity, the central question is: how to elevate the flexible reduction semantics to sparse compilation theory that assumes serial execution. Specifically, we have to tackle two main challenges: (1) there are wasted parallelism by adopting static synchronization granularity (2) static reduction strategy limits optimization space exploration. We propose Sgap: segment group and atomic parallelism to solve these problems. Atomic parallelism captures the flexible reduction semantics to systematically analyze the optimization space of sparse-dense hybrid algebra on GPU. It is a new optimization technique beyond current compiler-based and open-source runtime libraries. Segment group elevates the flexible reduction semantics to suitable levels of abstraction in the sparse compilation theory. It adopts changeable group size and user-defined reduction strategy to solve challenge (1) and (2), respectively. Finally, we use GPU sparse matrix-matrix multiplication (SpMM) on the TACO compiler as a use case to demonstrate the effectiveness of segment group in reduction semantics elevation. We achieve up to 1.2x speedup over the original TACO's SpMM kernels. We also apply new optimization techniques found by atomic parallelism to an open-source state-of-the-art SpMM library dgSPARSE. We achieve 1.6x - 2.3x speedup on the algorithm tuned with atomic parallelism.

翻译：粗略的编译器是稀薄的高温代数优化的一个很有希望的解决方案。在编译器实施中, 减少稀薄的高温混合代数在性能中发挥着关键作用。虽然 GPU 提供了各种减少语义, 能够更好地利用平行计算和记忆带宽能力, 但中心问题是: 如何将灵活的减少语义提升到稀薄的编译理论, 假设序列执行。具体地说, 我们必须应对两大挑战:(1) 采用静态同步颗粒度, 静态削减战略, 静态同步颗粒度, 静态削减战略, 限制空间探索。我们建议 Sgap : 分块组和原子平行, 解决这些问题。原子平行主义捕捉灵活减少语义, 系统分析稀薄的混合代数代数在 GPUPS 上最优化的空间。这是一个新的优化技术, 将灵活的减少语义缩放语义提升到 IMMRA 上, 将原始的缩略图集- 缩略图的缩略图用于 IMR 的缩略图。

0

相关内容

Tensor

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

198+阅读 · 2019年12月19日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

雷公藤多苷联合小檗碱预防和治疗2型糖尿病肾小管间质病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

棉铃虫性信息素腺体ACCase基因的克隆及功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于签名的Groebner基算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

WRKY类转录因子在托品烷类生物碱生物合成中的调控作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

炼化系统大规模动态与多目标优化的GPU异构并行加速策略及方法

国家自然科学基金

2+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

数字图像复原大规模问题的高性能算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

Arxiv

0+阅读 · 2022年10月20日

Bayesian Tensor-on-Tensor Regression with Efficient Computation

Bayesian Tensor-on-Tensor Regression with Efficient Computation

Arxiv

0+阅读 · 2022年10月20日

Safe Policy Improvement in Constrained Markov Decision Processes

Arxiv

0+阅读 · 2022年10月20日

Efficient Diffusion Models for Vision: A Survey

Arxiv

3+阅读 · 2022年10月20日

Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Arxiv

0+阅读 · 2022年10月20日

Efficient variational approximations for state space models

Arxiv

0+阅读 · 2022年10月20日

Equispaced Fourier representations for efficient Gaussian process regression from a billion data points

Arxiv

0+阅读 · 2022年10月18日

Efficient Evaluation of Arbitrary Relational Calculus Queries

Arxiv

0+阅读 · 2022年10月18日

Planning for Sample Efficient Imitation Learning

Arxiv

0+阅读 · 2022年10月18日

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

Arxiv

0+阅读 · 2022年10月18日

VIP会员

文章信息

相关主题

Atom（文本编辑器）

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

198+阅读 · 2019年12月19日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

中文版《未来战争：杀伤链优势与俄乌战争启示》报告

中国信通院规划所发布《人工智能算力基础设施赋能研究报告（2025年）》

人机编队将赢得未来战争

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

相关论文

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

Arxiv

0+阅读 · 2022年10月20日

Bayesian Tensor-on-Tensor Regression with Efficient Computation

Bayesian Tensor-on-Tensor Regression with Efficient Computation

Arxiv

0+阅读 · 2022年10月20日

Safe Policy Improvement in Constrained Markov Decision Processes

Arxiv

0+阅读 · 2022年10月20日

Efficient Diffusion Models for Vision: A Survey

Arxiv

3+阅读 · 2022年10月20日

Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Arxiv

0+阅读 · 2022年10月20日

Efficient variational approximations for state space models

Arxiv

0+阅读 · 2022年10月20日

Equispaced Fourier representations for efficient Gaussian process regression from a billion data points

Arxiv

0+阅读 · 2022年10月18日

Efficient Evaluation of Arbitrary Relational Calculus Queries

Arxiv

0+阅读 · 2022年10月18日

Planning for Sample Efficient Imitation Learning

Arxiv

0+阅读 · 2022年10月18日

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

Arxiv

0+阅读 · 2022年10月18日

相关基金

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

雷公藤多苷联合小檗碱预防和治疗2型糖尿病肾小管间质病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

棉铃虫性信息素腺体ACCase基因的克隆及功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于签名的Groebner基算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

WRKY类转录因子在托品烷类生物碱生物合成中的调控作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

炼化系统大规模动态与多目标优化的GPU异构并行加速策略及方法

国家自然科学基金

2+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

数字图像复原大规模问题的高性能算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员