CUDA应用程序最佳性能的可移植C++库：Kernel Launcher (Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications) - 专知论文

会员服务 ·

0

核化 · tuning · CUDA · Integration · TOOLS ·

2023 年 3 月 22 日

Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

翻译：CUDA应用程序最佳性能的可移植C++库：Kernel Launcher

Stijn Heldens,Ben van Werkhoven

Graphic Processing Units (GPUs) have become ubiquitous in scientific computing. However, writing efficient GPU kernels can be challenging due to the need for careful code tuning. To automatically explore the kernel optimization space, several auto-tuning tools - like Kernel Tuner - have been proposed. Unfortunately, these existing auto-tuning tools often do not concern themselves with integration of tuning results back into applications, which puts a significant implementation and maintenance burden on application developers. In this work, we present Kernel Launcher: an easy-to-use C++ library that simplifies the creation of highly-tuned CUDA applications. With Kernel Launcher, programmers can capture kernel launches, tune the captured kernels for different setups, and integrate the tuning results back into applications using runtime compilation. To showcase the applicability of Kernel Launcher, we consider a real-world computational fluid dynamics code and tune its kernels for different GPUs, input domains, and precisions.

翻译：图形处理单元（GPU）已成为科学计算中的普遍存在。不过，由于需要进行精心的代码调整，因此编写高效的GPU内核可能具有挑战性。为了自动探索内核优化空间，已经提出了几种自动调整工具，如Kernel Tuner。不幸的是，这些现有的自动调整工具通常并不关心将调整结果集成回应用程序中，这对应用程序开发人员带来了重大的实现和维护负担。在这项工作中，我们提供一种易于使用的C ++库 - Kernel Launcher，它简化了创建高度调整的CUDA应用程序的过程。使用Kernel Launcher，程序员可以捕获内核启动、为不同的设置调整捕获的内核，并使用运行时编译将调整结果集成回应用程序中。为了展示Kernel Launcher的适用性，我们考虑了一个实际的计算流体动力学代码，并将其内核调整为不同的GPU、输入域和精度。

0

相关内容

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

专知会员服务

22+阅读 · 2022年3月21日

【干货书】创建和部署深度学习应用，Programming PyTorch for Deep Learning Creating and Deploying Deep Learning Applications

【干货书】创建和部署深度学习应用，Programming PyTorch for Deep Learning Creating and Deploying Deep Learning Applications

专知会员服务

133+阅读 · 2022年3月17日

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

【2020新书】数据并行C++，掌握使用c++和SYCL编写异构系统的dpc++，565页pdf

【2020新书】数据并行C++，掌握使用c++和SYCL编写异构系统的dpc++，565页pdf

专知会员服务

38+阅读 · 2020年12月8日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

61+阅读 · 2020年8月6日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

编写完10万行代码，我发了篇长文吐槽Rust

编写完10万行代码，我发了篇长文吐槽Rust

机器之心

0+阅读 · 2022年6月25日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用全新 Android 指南，助您实现自动化测试

使用全新 Android 指南，助您实现自动化测试

谷歌开发者

0+阅读 · 2022年5月31日

用Now轻松部署无服务器Node应用程序

用Now轻松部署无服务器Node应用程序

前端之巅

16+阅读 · 2019年6月19日

R工程化—Rest API 之plumber包

R工程化—Rest API 之plumber包

R语言中文社区

11+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

lnc-CENPQ-2在颞叶内侧型癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

航天嵌入式软件设计一致性验证技术及其应用

国家自然科学基金

2+阅读 · 2014年12月31日

小型操作系统内核的轻量级形式化设计和验证方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

嵌入式控制软件的形式化规格说明构建的工程方法

国家自然科学基金

2+阅读 · 2013年12月31日

同步辐射光谱预测方法(Bethe-Salpeter程序)的发展及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

Groebner 基计算的新理论和快速算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

低功耗安全嵌入式处理器芯片的基础理论与关键技术

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

异构平台上以数据为中心的多线程编程模型扩展

国家自然科学基金

0+阅读 · 2009年12月31日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Multi-Wavelength Transponders for High-capacity Optical Networks: A Physical-layer-aware Network Planning Study

Arxiv

0+阅读 · 2023年5月12日

How to out-perform default random forest regression: choosing hyperparameters for applications in large-sample hydrology

Arxiv

0+阅读 · 2023年5月11日

Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

Arxiv

0+阅读 · 2023年5月10日

Object-Oriented Requirements: a Unified Framework for Specifications, Scenarios and Tests

Arxiv

0+阅读 · 2023年5月10日

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Arxiv

28+阅读 · 2022年2月28日

Matrix Decomposition and Applications

Arxiv

54+阅读 · 2022年1月1日

Weapon Engagement Zone Maximum Launch Range Estimation Using a Deep Neural Network

Arxiv

19+阅读 · 2021年11月17日

Multi-Object Tracking with Deep Learning Ensemble for Unmanned Aerial System Applications

Arxiv

26+阅读 · 2021年10月5日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

VIP会员

文章信息

相关主题

相关VIP内容

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

【牛津大学博士论文】流形的几何优化与深度学习的应用，154页pdf，Geometric Optimisation on Manifolds with Applications to Deep Learning

专知会员服务

22+阅读 · 2022年3月21日

【干货书】创建和部署深度学习应用，Programming PyTorch for Deep Learning Creating and Deploying Deep Learning Applications

【干货书】创建和部署深度学习应用，Programming PyTorch for Deep Learning Creating and Deploying Deep Learning Applications

专知会员服务

133+阅读 · 2022年3月17日

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

【2020新书】数据并行C++，掌握使用c++和SYCL编写异构系统的dpc++，565页pdf

【2020新书】数据并行C++，掌握使用c++和SYCL编写异构系统的dpc++，565页pdf

专知会员服务

38+阅读 · 2020年12月8日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

61+阅读 · 2020年8月6日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

编写完10万行代码，我发了篇长文吐槽Rust

编写完10万行代码，我发了篇长文吐槽Rust

机器之心

0+阅读 · 2022年6月25日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用全新 Android 指南，助您实现自动化测试

使用全新 Android 指南，助您实现自动化测试

谷歌开发者

0+阅读 · 2022年5月31日

用Now轻松部署无服务器Node应用程序

用Now轻松部署无服务器Node应用程序

前端之巅

16+阅读 · 2019年6月19日

R工程化—Rest API 之plumber包

R工程化—Rest API 之plumber包

R语言中文社区

11+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Multi-Wavelength Transponders for High-capacity Optical Networks: A Physical-layer-aware Network Planning Study

Arxiv

0+阅读 · 2023年5月12日

How to out-perform default random forest regression: choosing hyperparameters for applications in large-sample hydrology

Arxiv

0+阅读 · 2023年5月11日

Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

Arxiv

0+阅读 · 2023年5月10日

Object-Oriented Requirements: a Unified Framework for Specifications, Scenarios and Tests

Arxiv

0+阅读 · 2023年5月10日

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Hyperbolic Graph Neural Networks: A Review of Methods and Applications

Arxiv

28+阅读 · 2022年2月28日

Matrix Decomposition and Applications

Arxiv

54+阅读 · 2022年1月1日

Weapon Engagement Zone Maximum Launch Range Estimation Using a Deep Neural Network

Arxiv

19+阅读 · 2021年11月17日

Multi-Object Tracking with Deep Learning Ensemble for Unmanned Aerial System Applications

Arxiv

26+阅读 · 2021年10月5日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

相关基金

lnc-CENPQ-2在颞叶内侧型癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

航天嵌入式软件设计一致性验证技术及其应用

国家自然科学基金

2+阅读 · 2014年12月31日

小型操作系统内核的轻量级形式化设计和验证方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

嵌入式控制软件的形式化规格说明构建的工程方法

国家自然科学基金

2+阅读 · 2013年12月31日

同步辐射光谱预测方法(Bethe-Salpeter程序)的发展及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

Groebner 基计算的新理论和快速算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

低功耗安全嵌入式处理器芯片的基础理论与关键技术

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

异构平台上以数据为中心的多线程编程模型扩展

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员