State-of-art NPUs are typically architected as a self-contained sub-system with multiple heterogeneous hardware computing modules, and a dataflow-driven programming model. There lacks well-established methodology and tools in the industry to evaluate and compare the performance of NPUs from different architectures. We present an event-based performance modeling framework, VPU-EM, targeting scalable performance evaluation of modern NPUs across diversified AI workloads. The framework adopts high-level event-based system-simulation methodology to abstract away design details for speed, while maintaining hardware pipelining, concurrency and interaction with software task scheduling. It is natively developed in Python and built to interface directly with AI frameworks such as Tensorflow, PyTorch, ONNX and OpenVINO, linking various in-house NPU graph compilers to achieve optimized full model performance. Furthermore, VPU-EM also provides the capability to model power characteristics of NPU in Power-EM mode to enable joint performance/power analysis. Using VPU-EM, we conduct performance/power analysis of models from representative neural network architecture. We demonstrate that even though this framework is developed for Intel VPU, an Intel in-house NPU IP technology, the methodology can be generalized for analysis of modern NPUs.
翻译:目前,先进的NPUs通常被设计为一个自包含的子系统,内含多个异构的硬件计算模块和数据流驱动的编程模型。目前业界缺乏可用的方法和工具,可以对不同架构的NPU的性能进行评估和比较。我们提出了一种基于事件的性能建模框架VPU-EM,旨在针对多样化AI负载规模化地评估现代NPUs的性能。该框架采用高水平事件驱动的系统模拟方法,可以快速抽象设计细节,同时保持硬件管线化、并发性和与软件任务调度的交互。VPU-EM 原生采用Python开发,并构建了与Tensorflow、PyTorch、ONNX和OpenVINO等AI框架直接配对、链接多种内部NPU图编译器以实现优化的完整模型性能。此外,VPU-EM 还提供了模拟NPU功率特性的Power-EM模式的能力,以实现联合性能/功率分析。使用VPU-EM,我们对代表性神经网络架构的模型进行了性能/功率分析。我们证明,尽管该框架是为英特尔VPU(Intel VPU)开发的,该Intel内部 NPU IP技术,但该方法可泛化为现代NPUs的分析。