PAGANI: 数字学的平行适应性GPU数值 (PAGANI: A Parallel Adaptive GPU Algorithm for Numerical) - 专知论文

会员服务 ·

0

Integration · GPU · 相互独立的 · 模型评估 · Performance ·

2021 年 6 月 23 日

PAGANI: A Parallel Adaptive GPU Algorithm for Numerical

翻译：PAGANI: 数字学的平行适应性GPU数值

Ioannis Sakiotis,Kamesh Arumugam,Marc Paterno,Desh Ranjan,Balša Terzić,Mohammad Zubair

We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core utilization is difficult to achieve because the adaptive work-load can vary greatly across the integration space and is impossible to predict a priori. Existing parallel algorithms utilize sequential computations on independent processors, which results in bottlenecks due to the need for data redistribution and processor synchronization. Our algorithm employs a high-throughput approach in which all existing sub-regions are processed and sub-divided in parallel. Repeated sub-region classification and filtering improves upon a brute-force approach and allows the algorithm to make efficient use of computation and memory resources. A CUDA implementation shows orders of magnitude speedup over the fastest open-source CPU method and extends the achievable accuracy for difficult integrands. Our algorithm typically outperforms other existing deterministic parallel methods.

翻译：我们为大规模平行建筑的多维数字整合这一具有挑战性的问题提出了一种新的适应性平行算法。适应性算法展示了最佳的性能,但高效的多核心利用却难以实现,因为适应性工作负荷在整个整合空间中差异很大,无法预先预测。现有的平行算法在独立处理器上采用连续计算,这导致由于数据再分配和处理器同步的需要而出现瓶颈。我们的算法采用高通量法,所有现有的子区域都同时处理和分解。重复性的次区域分类和过滤法改进了粗力方法,使算法能够有效地使用计算和记忆资源。CUDA的实施显示快速的开放源代码处理法的加速度,并扩大了困难的元件的可实现的准确性。我们的算法通常优于其他现有的确定性平行方法。

0

相关内容

Integration

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI杂志。 Publisher：Elsevier。 SIT：http://dblp.uni-trier.de/db/journals/integration/

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

专知会员服务

54+阅读 · 2020年7月18日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

【硬核书】可扩展机器学习：并行分布式方法

【硬核书】可扩展机器学习：并行分布式方法

专知会员服务

86+阅读 · 2020年5月23日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知会员服务

71+阅读 · 2020年3月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

CCF推荐 | 国际会议信息8条

CCF推荐 | 国际会议信息8条

Call4Papers

9+阅读 · 2019年5月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices

Arxiv

0+阅读 · 2021年8月25日

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Arxiv

0+阅读 · 2021年8月24日

SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

Arxiv

0+阅读 · 2021年8月23日

Self-Directed Online Machine Learning for Topology Optimization

Arxiv

0+阅读 · 2021年8月22日

Regstar: Efficient Strategy Synthesis for Adversarial Patrolling Games

Arxiv

0+阅读 · 2021年8月19日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Arxiv

4+阅读 · 2018年4月2日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

专知会员服务

54+阅读 · 2020年7月18日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

【硬核书】可扩展机器学习：并行分布式方法

【硬核书】可扩展机器学习：并行分布式方法

专知会员服务

86+阅读 · 2020年5月23日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知会员服务

71+阅读 · 2020年3月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

CCF推荐 | 国际会议信息8条

CCF推荐 | 国际会议信息8条

Call4Papers

9+阅读 · 2019年5月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices

Arxiv

0+阅读 · 2021年8月25日

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Arxiv

0+阅读 · 2021年8月24日

SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

Arxiv

0+阅读 · 2021年8月23日

Self-Directed Online Machine Learning for Topology Optimization

Arxiv

0+阅读 · 2021年8月22日

Regstar: Efficient Strategy Synthesis for Adversarial Patrolling Games

Arxiv

0+阅读 · 2021年8月19日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Arxiv

4+阅读 · 2018年4月2日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员