Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.
翻译:深心神经网络(DNNs)在人工智能领域取得了前所未有的成功,包括计算机视觉、自然语言处理和语音识别。然而,它们的优异性表现是以计算复杂程度的高昂成本取得的,这严重阻碍了它们在许多资源受限制的装置中的应用,如移动电话和Things(IoT)装置的互联网。因此,能够提升效率瓶颈并同时保持DNNs高精确度的方法和技术非常需要,以便能够利用许多先进的AI应用程序。本文概述了高效的深思熟虑方法、系统和应用。我们从采用流行的模型压缩方法开始,包括裁剪裁、因数化、量化以及紧凑模型设计。为了降低这些人工解决方案的巨大设计成本,我们讨论了每个这些工具的自动ML框架,例如神经结构搜索(NAS)和自动化的剪裁和Quintation。我们随后进行了高效的在线培训,以便根据移动设备方面的当地数据进行用户定制。除了一般的加速技术外,我们还展示了几个特定的任务压缩方法,包括点云、视频和自然语言的加速度支持,最后我们从探索这些空间-空间再升级到深层系统。