Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
翻译:深神经网络(DNN)在许多计算机愿景任务中是成功的,然而,最准确的DNN需要数百万参数和操作,使其能量、计算和记忆密集。这妨碍了在低功率装置中部署大型DNN,而计算资源有限。最近的研究通过减少内存要求、能源消耗和操作数量而改善DNN模型,而不会大大降低准确性。本文调查低功率深知识和计算机愿景的进展,特别是在推断方面,并讨论压缩和加速DNN模型的方法。这些技术可以分为四大类:(1)参数的定量和运行,(2)压缩进化过滤器和矩阵要素化,(3)网络结构搜索,(4)知识蒸馏。我们分析了每一类技术问题的准确性、优点、缺点和潜在解决办法。我们还讨论了新的评价指标,作为未来研究的指导方针。