NERO:利用近中可重新配置的复合材料加速天气预报 (NERO: Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric) - 专知论文

会员服务 ·

0

Performer · 可约的 · 核化 · state-of-the-art · IBM ·

2021 年 7 月 19 日

NERO: Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric

翻译：NERO:利用近中可重新配置的复合材料加速天气预报

Gagandeep Singh,Dionysios Diamantopoulos,Juan Gómez-Luna,Christoph Hagleitner,Sander Stuijk,Henk Corporaal,Onur Mutlu

from arxiv, arXiv admin note: substantial text overlap with arXiv:2009.08241, arXiv:2106.06433

Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration. To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth memory (HBM). We focus on compound stencils that are fundamental kernels in weather prediction models. By using high-level synthesis techniques, we develop NERO, an FPGA+HBM-based accelerator connected through IBM OCAPI (Open Coherent Accelerator Processor Interface) to an IBM POWER9 host system. Our experimental results show that NERO outperforms a 16-core POWER9 system by 5.3x and 12.7x when running two different compound stencil kernels. NERO reduces the energy consumption by 12x and 35x for the same two kernels over the POWER9 system with an energy efficiency of 1.61 GFLOPS/Watt and 21.01 GFLOPS/Watt. We conclude that employing near-memory acceleration solutions for weather prediction modeling is promising as a means to achieve both high performance and high energy efficiency.

翻译：目前气候变化需要快速和准确的天气和气候模型。然而,当解决大规模天气预测模拟时,最先进的CPU和GPU的执行因业绩有限和能源消耗量高而受到影响。这些执行主要是复杂的不规则记忆存取模式和低计算强度,给加速速度带来根本性挑战。为了克服这些挑战,我们提议使用一个具有高带宽内存(HBM)的可重新配置结构来评估近模加速的使用情况。我们侧重于作为天气预测模型基本内核的复合固态。我们利用高层次合成技术开发了NERO,这是一个基于FPGA+HBM的加速器,通过IBM OCAPI(开放焦加速器处理器处理器)连接到IBM POWER9主机系统(OWER9)的快速加速器。我们的实验结果表明,NERO比一个16个核心的模型POWER9系统(HER9)高出5.3x和12.7x,同时运行两个不同的化合物内核内核内核。NERO通过12x和35的高温预测系统,将能源消耗量在GFSFL1的高温10。

0

相关内容

Performer

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

专知会员服务

39+阅读 · 2020年4月6日

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

专知会员服务

21+阅读 · 2019年12月1日

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

专知会员服务

6+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手

【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手

专知

5+阅读 · 2018年1月19日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

RAPID-RL: A Reconfigurable Architecture with Preemptive-Exits for Efficient Deep-Reinforcement Learning

Arxiv

0+阅读 · 2021年9月16日

Efficient Scaling of Dynamic Graph Neural Networks

Arxiv

0+阅读 · 2021年9月16日

A Column Streaming-Based Convolution Engine and Mapping Algorithm for CNN-based Edge AI accelerators

Arxiv

0+阅读 · 2021年9月15日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

RobustTAD: Robust Time Series Anomaly Detection via Decomposition and Convolutional Neural Networks

Arxiv

3+阅读 · 2020年2月21日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Arxiv

5+阅读 · 2018年12月19日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

专知会员服务

39+阅读 · 2020年4月6日

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

【ECML-PKDD 2019】基于邻域增强LSTM模型的出租车乘客需求预测（A Neighborhood-augmented LSTM Model for Taxi-Passenger Demand Prediction）

专知会员服务

21+阅读 · 2019年12月1日

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

专知会员服务

6+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

构建军事人工智能信任体系始于破除黑盒机制

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手

【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手

专知

5+阅读 · 2018年1月19日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

RAPID-RL: A Reconfigurable Architecture with Preemptive-Exits for Efficient Deep-Reinforcement Learning

Arxiv

0+阅读 · 2021年9月16日

Efficient Scaling of Dynamic Graph Neural Networks

Arxiv

0+阅读 · 2021年9月16日

A Column Streaming-Based Convolution Engine and Mapping Algorithm for CNN-based Edge AI accelerators

Arxiv

0+阅读 · 2021年9月15日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

RobustTAD: Robust Time Series Anomaly Detection via Decomposition and Convolutional Neural Networks

Arxiv

3+阅读 · 2020年2月21日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Arxiv

5+阅读 · 2018年12月19日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

微信扫码咨询专知VIP会员