We explore network sparsification strategies with the aim of compressing neural speech enhancement (SE) down to an optimal configuration for a new generation of low power microcontroller based neural accelerators (microNPU's). We examine three unique sparsity structures: weight pruning, block pruning and unit pruning; and discuss their benefits and drawbacks when applied to SE. We focus on the interplay between computational throughput, memory footprint and model quality. Our method supports all three structures above and jointly learns integer quantized weights along with sparsity. Additionally, we demonstrate offline magnitude based pruning of integer quantized models as a performance baseline. Although efficient speech enhancement is an active area of research, our work is the first to apply block pruning to SE and the first to address SE model compression in the context of microNPU's. Using weight pruning, we show that we are able to compress an already compact model's memory footprint by a factor of 42x from 3.7MB to 87kB while only losing 0.1 dB SDR in performance. We also show a computational speedup of 6.7x with a corresponding SDR drop of only 0.59 dB SDR using block pruning.
翻译:我们探索网络聚变战略,目的是压缩神经语音增强(SE)到一个最佳配置,让新一代低功能微控制器基于神经加速器(MrocNPU's)的新型低功率微控制器神经加速器(McRNPU's),我们研究三个独特的聚度结构:重量裁剪裁、区块剪裁和单位裁剪;在应用到SE时,我们讨论它们的好处和缺点;我们侧重于计算性能、记忆足迹和模型质量之间的相互作用。我们的方法支持以上所有三个结构,并共同学习与宽度一起的整分数重量。此外,我们展示了基于离线性能规模的以整数四分数为基础的整四分化模型,作为性能基线。尽管高效的语音增强是一个活跃的研究领域,但我们的工作是首先在微小NPNPU的环境下对SE进行区块剪裁剪裁,第一个针对SEE的模型压缩。我们利用重量裁剪裁,表明我们能够将已经紧凑的模型的记忆足点压缩成42x,从3.7MB到87kB,而只在业绩中仅损失0.1 dBSRISDRIS的0.9,我们还展示了SRISDRSDRSUSU的正。