With the surge of inexpensive computational and memory resources, neural networks (NNs) have experienced an unprecedented growth in architectural and computational complexity. Introducing NNs to resource-constrained devices enables cost-efficient deployments, widespread availability, and the preservation of sensitive data. This work addresses the challenges of bringing Machine Learning to MCUs, where we focus on the ubiquitous ARM Cortex-M architecture. The detailed effects and trade-offs that optimization methods, software frameworks, and MCU hardware architecture have on key performance metrics such as inference latency and energy consumption have not been previously studied in depth for state-of-the-art frameworks such as TensorFlow Lite Micro. We find that empirical investigations which measure the perceptible metrics - performance as experienced by the user - are indispensable, as the impact of specialized instructions and layer types can be subtle. To this end, we propose an implementation-aware design as a cost-effective method for verification and benchmarking. Employing our developed toolchain, we demonstrate how existing NN deployments on resource-constrained devices can be improved by systematically optimizing NNs to their targeted application scenario.
翻译:随着廉价计算和记忆资源的激增,神经网络在建筑和计算复杂性方面经历了前所未有的增长。在资源限制的装置中引入神经网络能够实现成本效益高的部署、广泛提供和保存敏感数据。这项工作解决了将机器学习引入多边协调单位的挑战,我们集中关注无处不在的ARM Cortex-M结构。优化方法、软件框架和多边协调单位硬件结构对于关键性能衡量标准(如推论延缓和能源消耗)的详细影响和取舍,以前没有深入研究过诸如TensorFlow Lite Micro等最先进的框架。我们发现,通过系统地优化专用指令和层类型的影响,衡量可视指标的实证调查是不可或缺的,为此,我们提议采用一种具有成本效益的设计,作为核查和基准的成本效益方法。使用我们开发的工具链,我们展示了如何通过系统地优化非专用专用装置的定向应用来改进现有的资源限制装置。