The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators) is challenging. In this paper, we summarize our recent efforts for efficient on-device AI development from three aspects, including both training and inference. First, we present on-device training with ultra-low memory usage. We propose a novel rank-adaptive tensor-based tensorized neural network model, which offers orders-of-magnitude memory reduction during training. Second, we introduce an ultra-low bitwidth quantization method for DNN model compression, achieving the state-of-the-art accuracy under the same compression ratio. Third, we introduce an ultra-low latency DNN accelerator design, practicing the software/hardware co-design methodology. This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.
翻译:位于边缘的深神经网络(DNN)基于在边缘的深神经网络(DNN)应用程序需要低成本的计算平台和高质量的服务。然而,边缘设备的有限的记忆、计算资源和电动预算限制了DNN算法的有效性。制定面向边缘的AI算法和实施(例如加速器)具有挑战性。在本文件中,我们从三个方面总结了我们最近为高效的在线设计AI开发所做的努力,包括培训和推断。首先,我们提出了使用超低存储器的顶级培训。我们提出了一个新的级适应式高压强神经网络模型,该模型在培训期间提供高压存储令的减少。第二,我们为DNNN模型压缩采用了超低位位位位位位位位数的量化方法,在相同的压缩率下实现了最新水平的准确性。第三,我们引入了超低拉特的 DNNNE加速器设计,在软件/硬软件共同设计方法上进行操作。本文强调了培训、二次突破和研究区域设计设计的重要性和有效性。