Efficient on-device neural network (NN) inference offers predictable latency, improved privacy and reliability, and lower operating costs for vendors than cloud-based inference. This has sparked recent development of microcontroller-scale NN accelerators, also known as neural processing units ($\mu$NPUs), designed specifically for ultra-low-power applications. We present the first comparative evaluation of a number of commercially-available $\mu$NPUs, including the first independent benchmarks for multiple platforms. To ensure fairness, we develop and open-source a model compilation pipeline supporting consistent benchmarking of quantized models across diverse microcontroller hardware. Our resulting analysis uncovers both expected performance trends as well as surprising disparities between hardware specifications and actual performance, including certain $\mu$NPUs exhibiting unexpected scaling behaviors with model complexity. This work provides a foundation for ongoing evaluation of $\mu$NPU platforms, alongside offering practical insights for both hardware and software developers in this rapidly evolving space.
翻译:与基于云端的推理相比,高效的设备端神经网络推理为供应商提供了可预测的延迟、更强的隐私保护与可靠性,以及更低的运营成本。这推动了近期微控制器级神经网络加速器(亦称为神经处理单元μNPU)的发展,其专为超低功耗应用而设计。本文首次对多款商用μNPU进行了比较性评估,并提供了多个平台的首次独立基准测试。为确保公平性,我们开发并开源了一个模型编译流水线,支持在不同微控制器硬件上对量化模型进行一致的基准测试。我们的分析结果既揭示了预期的性能趋势,也发现了硬件规格与实际性能之间的显著差异,包括某些μNPU在模型复杂度增加时表现出意料之外的扩展行为。本研究为持续评估μNPU平台奠定了基础,同时为这一快速发展领域的硬件和软件开发者提供了实用的见解。