Keyword spotting (KWS) is beneficial for voice-based user interactions with low-power devices at the edge. The edge devices are usually always-on, so edge computing brings bandwidth savings and privacy protection. The devices typically have limited memory spaces, computational performances, power and costs, for example, Cortex-M based microcontrollers. The challenge is to meet the high computation and low-latency requirements of deep learning on these devices. This paper firstly shows our small-footprint KWS system running on STM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM. Our selected convolutional neural network (CNN) architecture has simplified number of operations for KWS to meet the constraint of edge devices. Our baseline system generates classification results for each 37ms including real-time audio feature extraction part. This paper further evaluates the actual performance for different pruning and quantization methods on microcontroller, including different granularity of sparsity, skipping zero weights, weight-prioritized loop order, and SIMD instruction. The result shows that for microcontrollers, there are considerable challenges for accelerate unstructured pruned models, and the structured pruning is more friendly than unstructured pruning. The result also verified that the performance improvement for quantization and SIMD instruction.
翻译:关键字定位( KWS) 有利于以声音为基础的用户在边缘与低功率设备进行互动。 边缘设备通常总是在运行, 所以边缘计算可以节省带宽和隐私保护。 这些设备通常具有有限的记忆空间、 计算性能、 动力和成本, 例如基于 Cortex- M 的微控制器。 挑战在于满足这些设备深层学习的高计算和低长要求。 本文首先展示了在 STM32F7 微控制器上运行的小脚印 KWS 系统, 其核心为 Cortex- M7 核心@ 216MHz 和 512KB 静态 RAM。 我们选中的 convolucial 神经网络( CNN) 结构简化了 KWS 的操作数量, 以满足边缘设备的制约。 我们的基线系统生成每37米的分类结果, 包括实时音频提取部分。 本文进一步评估了微控制器上不同调和四分化方法的实际性能, 包括不同的颗粒性、 跳过零重、 重、 重定位化循环顺序和 512KB 静控 结构的改进过程, 也显示非硬化的软化 结构性测试结果, 。 。 用于非硬化的硬化的硬化 度管理, 结构性调整 快速性调整 结构 快速性调整 结构 加速性调整的改进 。