Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and sparsity to fit more parameters in the same footprint. However, designing compressible neural networks (NNs) is challenging, as it expands the design space across which we must make balanced trade-offs. This paper demonstrates Unified DNAS for Compressible (UDC) NNs, which explores a large search space to generate state-of-the-art compressible NNs for NPU. ImageNet results show UDC networks are up to $3.35\times$ smaller (iso-accuracy) or 6.25% more accurate (iso-model size) than previous work.
翻译:由于设备内存能力有限,在低成本IOT硬件上部署小ML模型非常具有挑战性。神经处理器(NPU)硬件通过使用模型压缩来利用重量定量和宽度来适应同一足迹中的更多参数来应对记忆挑战。然而,设计压缩神经网络(NNs)具有挑战性,因为它扩大了设计空间,我们必须在设计上作出平衡的权衡。本文展示了可压缩 NNP(UDC)的统一DNAS,它探索了巨大的搜索空间,为NPU生成最先进的可压缩的NNW。图像网结果显示,UDC网络比以前的工作要小3.35美元(超精确度)或6.25%(等型号)的精确度。