VPU-EM: 一种基于事件建模框架，用于评估规模化的NPU性能和功率效率 (VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale) - 专知论文

会员服务 ·

0

Performer · NPU · MoDELS · Analysis · 缩放 ·

2023 年 3 月 17 日

VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale

翻译：VPU-EM: 一种基于事件建模框架，用于评估规模化的NPU性能和功率效率

Charles Qi,Yi Wang,Hui Wang,Yang Lu,Shiva Shankar Subramanian,Finola Cahill,Conall Tuohy,Victor Li,Xu Qian,Darren Crews,Ling Wang,Shivaji Roy,Andrea Deidda,Martin Power,Niall Hanrahan,Rick Richmond,Umer Cheema,Arnab Raha,Alessandro Palla,Gary Baugh,Deepak Mathaikutty

from arxiv, 8 pages, 9 figures

State-of-art NPUs are typically architected as a self-contained sub-system with multiple heterogeneous hardware computing modules, and a dataflow-driven programming model. There lacks well-established methodology and tools in the industry to evaluate and compare the performance of NPUs from different architectures. We present an event-based performance modeling framework, VPU-EM, targeting scalable performance evaluation of modern NPUs across diversified AI workloads. The framework adopts high-level event-based system-simulation methodology to abstract away design details for speed, while maintaining hardware pipelining, concurrency and interaction with software task scheduling. It is natively developed in Python and built to interface directly with AI frameworks such as Tensorflow, PyTorch, ONNX and OpenVINO, linking various in-house NPU graph compilers to achieve optimized full model performance. Furthermore, VPU-EM also provides the capability to model power characteristics of NPU in Power-EM mode to enable joint performance/power analysis. Using VPU-EM, we conduct performance/power analysis of models from representative neural network architecture. We demonstrate that even though this framework is developed for Intel VPU, an Intel in-house NPU IP technology, the methodology can be generalized for analysis of modern NPUs.

翻译：目前，先进的NPUs通常被设计为一个自包含的子系统，内含多个异构的硬件计算模块和数据流驱动的编程模型。目前业界缺乏可用的方法和工具，可以对不同架构的NPU的性能进行评估和比较。我们提出了一种基于事件的性能建模框架VPU-EM，旨在针对多样化AI负载规模化地评估现代NPUs的性能。该框架采用高水平事件驱动的系统模拟方法，可以快速抽象设计细节，同时保持硬件管线化、并发性和与软件任务调度的交互。VPU-EM 原生采用Python开发，并构建了与Tensorflow、PyTorch、ONNX和OpenVINO等AI框架直接配对、链接多种内部NPU图编译器以实现优化的完整模型性能。此外，VPU-EM 还提供了模拟NPU功率特性的Power-EM模式的能力，以实现联合性能/功率分析。使用VPU-EM，我们对代表性神经网络架构的模型进行了性能/功率分析。我们证明，尽管该框架是为英特尔VPU（Intel VPU）开发的，该Intel内部 NPU IP技术，但该方法可泛化为现代NPUs的分析。

0

相关内容

Performer

基于图神经网络的空间加速器可移植映射

基于图神经网络的空间加速器可移植映射

专知会员服务

6+阅读 · 2022年7月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2022新书】知识表示和机器学习的预测和分析，232页pdf，Prediction and Analysis for Knowledge Representation and Machine Learning

【2022新书】知识表示和机器学习的预测和分析，232页pdf，Prediction and Analysis for Knowledge Representation and Machine Learning

专知会员服务

120+阅读 · 2022年3月11日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

专知会员服务

26+阅读 · 2021年8月9日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

字节跳动开源自研 Shuffle 框架——Cloud Shuffle Service

字节跳动开源自研 Shuffle 框架——Cloud Shuffle Service

InfoQ

0+阅读 · 2022年8月26日

网易数帆宣布流式湖仓服务 Arctic 开源，内部性能测试超过 Iceberg

网易数帆宣布流式湖仓服务 Arctic 开源，内部性能测试超过 Iceberg

InfoQ

0+阅读 · 2022年8月17日

【Manning新书】MLOps工程规模化，344页pdf

【Manning新书】MLOps工程规模化，344页pdf

专知

24+阅读 · 2022年5月4日

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

基于非易失内存设备的数据读写性能优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

GPU程序访存行为分析和优化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

冷热电联供型微电网高效运行的建模和优化方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于事件曝光模型的云服务测试与调试研究

国家自然科学基金

0+阅读 · 2012年12月31日

CCND1基因rs9344位点多态性影响汉族女性宫颈癌易感性的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

气动微流控芯片气压控制微阀的机理及关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

元数据驱动的企业数据模型验证与管理研究

国家自然科学基金

2+阅读 · 2009年12月31日

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Arxiv

0+阅读 · 2023年5月11日

A Data-Driven Approach to Lightweight DVFS-Aware Counter-Based Power Modeling for Heterogeneous Platforms

Arxiv

0+阅读 · 2023年5月11日

IVP-VAE: Modeling EHR Time Series with Initial Value Problem Solvers

Arxiv

0+阅读 · 2023年5月11日

Speech Modeling with a Hierarchical Transformer Dynamical VAE

Arxiv

0+阅读 · 2023年5月10日

Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer

Arxiv

0+阅读 · 2023年5月9日

Both Efficiency and Effectiveness! A Large Scale Pre-ranking Framework in Search System

Arxiv

0+阅读 · 2023年5月9日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

13+阅读 · 2018年6月26日

A Unified Knowledge Representation and Context-aware Recommender System in Internet of Things

Arxiv

10+阅读 · 2018年5月10日

VIP会员

文章信息

相关主题

相关VIP内容

基于图神经网络的空间加速器可移植映射

基于图神经网络的空间加速器可移植映射

专知会员服务

6+阅读 · 2022年7月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2022新书】知识表示和机器学习的预测和分析，232页pdf，Prediction and Analysis for Knowledge Representation and Machine Learning

【2022新书】知识表示和机器学习的预测和分析，232页pdf，Prediction and Analysis for Knowledge Representation and Machine Learning

专知会员服务

120+阅读 · 2022年3月11日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

专知会员服务

26+阅读 · 2021年8月9日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

字节跳动开源自研 Shuffle 框架——Cloud Shuffle Service

字节跳动开源自研 Shuffle 框架——Cloud Shuffle Service

InfoQ

0+阅读 · 2022年8月26日

网易数帆宣布流式湖仓服务 Arctic 开源，内部性能测试超过 Iceberg

网易数帆宣布流式湖仓服务 Arctic 开源，内部性能测试超过 Iceberg

InfoQ

0+阅读 · 2022年8月17日

【Manning新书】MLOps工程规模化，344页pdf

【Manning新书】MLOps工程规模化，344页pdf

专知

24+阅读 · 2022年5月4日

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Arxiv

0+阅读 · 2023年5月11日

A Data-Driven Approach to Lightweight DVFS-Aware Counter-Based Power Modeling for Heterogeneous Platforms

Arxiv

0+阅读 · 2023年5月11日

IVP-VAE: Modeling EHR Time Series with Initial Value Problem Solvers

Arxiv

0+阅读 · 2023年5月11日

Speech Modeling with a Hierarchical Transformer Dynamical VAE

Arxiv

0+阅读 · 2023年5月10日

Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer

Arxiv

0+阅读 · 2023年5月9日

Both Efficiency and Effectiveness! A Large Scale Pre-ranking Framework in Search System

Arxiv

0+阅读 · 2023年5月9日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

13+阅读 · 2018年6月26日

A Unified Knowledge Representation and Context-aware Recommender System in Internet of Things

Arxiv

10+阅读 · 2018年5月10日

相关基金

基于非易失内存设备的数据读写性能优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

GPU程序访存行为分析和优化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

冷热电联供型微电网高效运行的建模和优化方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于事件曝光模型的云服务测试与调试研究

国家自然科学基金

0+阅读 · 2012年12月31日

CCND1基因rs9344位点多态性影响汉族女性宫颈癌易感性的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

气动微流控芯片气压控制微阀的机理及关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

元数据驱动的企业数据模型验证与管理研究

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员