C-为金属:Intel GPUs的SIMD高性能 SIMD编程 (C-for-Metal: High Performance SIMD Programming on Intel GPUs) - 专知论文

会员服务 ·

0

Performer · OpenCL · 英特尔 (Intel) · GPU · 标量 ·

2021 年 1 月 26 日

C-for-Metal: High Performance SIMD Programming on Intel GPUs

翻译：C-为金属:Intel GPUs的SIMD高性能 SIMD编程

Guei-Yuan Lueh,Kaiyu Chen,Gang Chen,Joel Fuentes,Wei-Yu Chen,Fangwen Fu,Hong Jiang,Hongzheng Li,Daniel Rhee

The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the best-known SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.

翻译：SIMT执行模式通常用于一般 GPU 开发。 CUDA 和 OpenCL 开发者编写由编译器和硬件暗含平行的标码。但是,在 Intel GPU 上,这种抽象性具有深刻的性能影响,因为ISSA 的根基是SIMD, 重要的硬件能力无法充分利用。要缩小这一性能差距,我们引入C-For-Metal(CM), 即一个明确的SIMD编程框架, 目的是在 Intel GPU 上提供近距离到金属的性能。 CMD 编程语言及其矢量/矩阵类型提供了一个直观界面, 以利用基本硬件特征, 允许精细的注册管理、 SIMD 尺寸控制和跨链数据共享。实验结果表明, 不同领域的CMD应用程序比最著名的SIMT- OpenCL 执行系统(CMT), 在最新的 Intel GPU 上达到2.7x 速度。

0

相关内容

Performer

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

已删除

将门创投

4+阅读 · 2018年6月4日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

MRIReco.jl: An MRI Reconstruction Framework written in Julia

Arxiv

0+阅读 · 2021年3月19日

An Auditability, Transparent, and Privacy-Preserving for Supply Chain Traceability Based on Blockchain

Arxiv

0+阅读 · 2021年3月18日

Linear Convergent Decentralized Optimization with Compression

Arxiv

0+阅读 · 2021年3月18日

Stationary underdispersed INAR(1) models based on the backward approach

Arxiv

0+阅读 · 2021年3月18日

The Case for High-Accuracy Classification: Think Small, Think Many!

The Case for High-Accuracy Classification: Think Small, Think Many!

Arxiv

0+阅读 · 2021年3月18日

Porting a sparse linear algebra math library to Intel GPUs

Arxiv

0+阅读 · 2021年3月18日

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

Arxiv

0+阅读 · 2021年3月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural Arithmetic Logic Units

Neural Arithmetic Logic Units

Arxiv

5+阅读 · 2018年8月1日

YOLOv3: An Incremental Improvement

Arxiv

8+阅读 · 2018年4月8日

VIP会员

文章信息

相关主题

英特尔 (Intel)

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

已删除

将门创投

4+阅读 · 2018年6月4日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

相关论文

MRIReco.jl: An MRI Reconstruction Framework written in Julia

Arxiv

0+阅读 · 2021年3月19日

An Auditability, Transparent, and Privacy-Preserving for Supply Chain Traceability Based on Blockchain

Arxiv

0+阅读 · 2021年3月18日

Linear Convergent Decentralized Optimization with Compression

Arxiv

0+阅读 · 2021年3月18日

Stationary underdispersed INAR(1) models based on the backward approach

Arxiv

0+阅读 · 2021年3月18日

The Case for High-Accuracy Classification: Think Small, Think Many!

The Case for High-Accuracy Classification: Think Small, Think Many!

Arxiv

0+阅读 · 2021年3月18日

Porting a sparse linear algebra math library to Intel GPUs

Arxiv

0+阅读 · 2021年3月18日

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

Arxiv

0+阅读 · 2021年3月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural Arithmetic Logic Units

Neural Arithmetic Logic Units

Arxiv

5+阅读 · 2018年8月1日

YOLOv3: An Incremental Improvement

Arxiv

8+阅读 · 2018年4月8日

微信扫码咨询专知VIP会员