tritonBLAS：基于Triton的GEMM内核参数选择解析方法 (tritonBLAS: Triton-based Analytical Approach for GEMM Kernel Parameter Selection) - 专知论文

会员服务 ·

0

自动调优 · 参数选择 · 解析方法 · GPU · 解析模型 ·

tritonBLAS: Triton-based Analytical Approach for GEMM Kernel Parameter Selection

翻译：tritonBLAS：基于Triton的GEMM内核参数选择解析方法

Ryan Swann,Muhammad Osama,Xiaohu Guo,Bryant Nelson,Lixun Zhang,Alex Brown,Yen Ong,Ali Yazdani,Sean Siddens,Ganesh Dasika,Alex Underwood

We present tritonBLAS, a fast and deterministic analytical model that uses architectural parameters like the cache hierarchy, and relative code and data placement to generate performant GPU GEMM kernels. tritonBLAS explicitly models the relationship between architectural topology, matrix shapes, and algorithmic blocking behavior to predict near-optimal configurations without runtime autotuning. Based on this model, we developed and implemented a lightweight GEMM framework entirely within Triton. We evaluate the performance of tritonBLAS across a diverse set of GEMM problem sizes on modern GPUs. tritonBLAS achieves over 95% of the performance of autotuning solutions, while reducing autotuning time to zero. This makes tritonBLAS a practical drop-in replacement for empirical tuning in production HPC and ML workloads.

翻译：本文提出tritonBLAS，一种快速且确定性的解析模型，该模型利用缓存层次结构等架构参数以及代码与数据的相对布局来生成高性能的GPU通用矩阵乘法（GEMM）内核。tritonBLAS显式地建模了架构拓扑、矩阵形状与算法分块行为之间的关系，从而无需运行时自动调优即可预测接近最优的配置。基于此模型，我们完全在Triton框架内开发并实现了一个轻量级GEMM框架。我们在现代GPU上针对多种GEMM问题规模评估了tritonBLAS的性能。实验表明，tritonBLAS能达到自动调优方案95%以上的性能，同时将自动调优时间降为零。这使得tritonBLAS成为生产环境中高性能计算（HPC）和机器学习（ML）工作负载中经验性调优的实用即插即用替代方案。

0

相关内容

自动调优

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

专知会员服务

68+阅读 · 2023年2月24日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

ICCV'21 Oral｜拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

专知会员服务

16+阅读 · 2021年8月11日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference

LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference

Arxiv

0+阅读 · 12月18日

Distributed HDMM: Scalable, Distributed, Accurate, and Differentially Private Query Workloads without a Trusted Curator

Arxiv

0+阅读 · 12月17日

ModSSC: A Modular Framework for Semi-Supervised Classification on Heterogeneous Data

Arxiv

0+阅读 · 12月15日

HeLEx: A Heterogeneous Layout Explorer for Spatial Elastic Coarse-Grained Reconfigurable Arrays

Arxiv

0+阅读 · 11月24日

CDLM: Consistency Diffusion Language Models For Faster Sampling

Arxiv

0+阅读 · 11月24日

VIP会员

文章信息

相关主题

相关VIP内容

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

专知会员服务

68+阅读 · 2023年2月24日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

ICCV'21 Oral｜拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

专知会员服务

16+阅读 · 2021年8月11日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

相关论文

LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference

LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference

Arxiv

0+阅读 · 12月18日

Distributed HDMM: Scalable, Distributed, Accurate, and Differentially Private Query Workloads without a Trusted Curator

Arxiv

0+阅读 · 12月17日

ModSSC: A Modular Framework for Semi-Supervised Classification on Heterogeneous Data

Arxiv

0+阅读 · 12月15日

HeLEx: A Heterogeneous Layout Explorer for Spatial Elastic Coarse-Grained Reconfigurable Arrays

Arxiv

0+阅读 · 11月24日

CDLM: Consistency Diffusion Language Models For Faster Sampling

Arxiv

0+阅读 · 11月24日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员