Pruning is a neural network optimization technique that sacrifices accuracy in exchange for lower computational requirements. Pruning has been useful when working with extremely constrained environments in tinyML. Unfortunately, special hardware requirements and limited study on its effectiveness on already compact models prevent its wider adoption. Depth pruning is a form of pruning that requires no specialized hardware but suffers from a large accuracy falloff. To improve this, we propose a modification that utilizes a highly efficient auxiliary network as an effective interpreter of intermediate feature maps. Our results show a parameter reduction of 93% on the MLPerfTiny Visual Wakewords (VWW) task and 28% on the Keyword Spotting (KWS) task with accuracy cost of 0.65% and 1.06% respectively. When evaluated on a Cortex-M0 microcontroller, our proposed method reduces the VWW model size by 4.7x and latency by 1.6x while counter intuitively gaining 1% accuracy. KWS model size on Cortex-M0 was also reduced by 1.2x and latency by 1.2x at the cost of 2.21% accuracy.
翻译:Pruning是一种神经网络优化技术,它牺牲了精确度,以换取较低的计算要求。 Prutning在与极受限制的极小ML 环境中工作时非常有用。 不幸的是,特殊硬件要求和对已经很紧凑的模型有效性的有限研究使得它无法被广泛采用。深度剪裁是一种剪裁形式,不需要专门的硬件,但会受到大量精度下降的影响。为了改进这一方法,我们建议进行一项修改,利用高效的辅助网络作为中间地貌图的有效解释器。我们的结果显示,MLPerfTiny 视觉Wakewords(VWW)的任务的参数减少了93%,KWS(KWS)的任务的参数减少了28%,其精度成本分别为0.65%和1.06%。在对Cortex-M0微控制器进行评估时,我们提出的方法将VWW模型的长度减少了4.7x和1.6x,而直径获得1%的精确度则减少了1%。Cortex-M0的KWS模型大小也减少了1.2x和延缩1.2x,费用为2.21 %。