自动化翻译与加速多 GPU 平台上的微分方程求解 (Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms) - 专知论文

会员服务 ·

0

GPU · 性能可移植 · 集成 · JAX · 大规模并行 ·

2023 年 4 月 13 日

Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

翻译：自动化翻译与加速多 GPU 平台上的微分方程求解

Utkarsh Utkarsh,Valentin Churavy,Yingbo Ma,Tim Besard,Tim Gymnich,Adam R. Gerlach,Alan Edelman,Christopher Rackauckas

from arxiv, 11 figures

We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels, while performing $20-100\times$ faster than the vectorized-map (\texttt{vmap}) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured, supporting event handling, automatic differentiation, and incorporating of datasets via the GPU's texture memory, allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance.

翻译：本文介绍一种高性能、适用于多个GPU的无厂商关联（vendor-agnostic）的方法，用于大规模并行求解普通微分方程组和随机微分方程组。该方法与一种高级编程语言 Julia 的 DifferentialEquations.jl 库集成，无需用户修改代码即可加速 GPU。相比手动优化的 CUDA-C++ 核，本方法取得了状态-of-the-art 式的性能，同时比 JAX 和 PyTorch 实现的向量化-map (\texttt{vmap}) 方法快 20-100 倍。GPU 性能可移植性和厂商无关性在 NVIDIA、AMD、Intel 和 Apple GPU 上进行了性能评估。我们展示了与 MPI 的复合性，以支持分布式多 GPU 工作流程。实现的求解器功能齐全，支持事件处理，自动微分以及通过 GPU 纹理内存集成数据集，使科学家们在所有主要当前架构上充分利用 GPU 加速，无需更改其模型代码且无损性能。

0

相关内容

GPU

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

专知会员服务

94+阅读 · 2022年4月8日

【干货书】面向计算科学和工程的Python导论，167页pdf

【干货书】面向计算科学和工程的Python导论，167页pdf

专知会员服务

42+阅读 · 2021年4月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Tensorflow 新一轮迭代路线图：更好的 XLA 编译和分布式计算

Tensorflow 新一轮迭代路线图：更好的 XLA 编译和分布式计算

InfoQ

0+阅读 · 2022年11月20日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

R工程化—Rest API 之plumber包

R工程化—Rest API 之plumber包

R语言中文社区

11+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

一类微分半变分不等式问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

四阶微分方程的谱和谱元方法

国家自然科学基金

0+阅读 · 2014年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

双线性抛物最优控制问题有限元方法的超收敛性研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

变系数微分方程的谱方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于两重网格的Navier-Stokes方程并行自适应后处理及变分多尺度算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

分数阶微分方程的数值计算和动力学行为

国家自然科学基金

0+阅读 · 2008年12月31日

Learning Runtime Decisions for Adaptive Real-Time Perception

Arxiv

0+阅读 · 2023年6月1日

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年6月1日

A Formal Model for Secure Multiparty Computation

Arxiv

0+阅读 · 2023年6月1日

Combining Particle and Tensor-network Methods for Partial Differential Equations via Sketching

Arxiv

0+阅读 · 2023年6月1日

On the Forward Invariance of Neural ODEs

Arxiv

0+阅读 · 2023年5月31日

Energy stable and maximum bound principle preserving schemes for the Allen-Cahn equation based on the Saul'yev methods

Arxiv

0+阅读 · 2023年5月31日

The Information Retrieval Experiment Platform

Arxiv

0+阅读 · 2023年5月30日

Performance of affine-splitting pseudo-spectral methods for fractional complex Ginzburg-Landau equations

Arxiv

0+阅读 · 2023年5月29日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

VIP会员

文章信息

相关主题

性能可移植

大规模并行

相关VIP内容

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

专知会员服务

94+阅读 · 2022年4月8日

【干货书】面向计算科学和工程的Python导论，167页pdf

【干货书】面向计算科学和工程的Python导论，167页pdf

专知会员服务

42+阅读 · 2021年4月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

Tensorflow 新一轮迭代路线图：更好的 XLA 编译和分布式计算

Tensorflow 新一轮迭代路线图：更好的 XLA 编译和分布式计算

InfoQ

0+阅读 · 2022年11月20日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

R工程化—Rest API 之plumber包

R工程化—Rest API 之plumber包

R语言中文社区

11+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Learning Runtime Decisions for Adaptive Real-Time Perception

Arxiv

0+阅读 · 2023年6月1日

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年6月1日

A Formal Model for Secure Multiparty Computation

Arxiv

0+阅读 · 2023年6月1日

Combining Particle and Tensor-network Methods for Partial Differential Equations via Sketching

Arxiv

0+阅读 · 2023年6月1日

On the Forward Invariance of Neural ODEs

Arxiv

0+阅读 · 2023年5月31日

Energy stable and maximum bound principle preserving schemes for the Allen-Cahn equation based on the Saul'yev methods

Arxiv

0+阅读 · 2023年5月31日

The Information Retrieval Experiment Platform

Arxiv

0+阅读 · 2023年5月30日

Performance of affine-splitting pseudo-spectral methods for fractional complex Ginzburg-Landau equations

Arxiv

0+阅读 · 2023年5月29日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

相关基金

一类微分半变分不等式问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

四阶微分方程的谱和谱元方法

国家自然科学基金

0+阅读 · 2014年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

双线性抛物最优控制问题有限元方法的超收敛性研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

变系数微分方程的谱方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于两重网格的Navier-Stokes方程并行自适应后处理及变分多尺度算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

分数阶微分方程的数值计算和动力学行为

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员