Despite its importance for federated learning, continuous learning and many other applications, on-device training remains an open problem for EdgeAI. The problem stems from the large number of operations (e.g., floating point multiplications and additions) and memory consumption required during training by the back-propagation algorithm. Consequently, in this paper, we propose a new gradient filtering approach which enables on-device DNN model training. More precisely, our approach creates a special structure with fewer unique elements in the gradient map, thus significantly reducing the computational complexity and memory consumption of back propagation during training. Extensive experiments on image classification and semantic segmentation with multiple DNN models (e.g., MobileNet, DeepLabV3, UPerNet) and devices (e.g., Raspberry Pi and Jetson Nano) demonstrate the effectiveness and wide applicability of our approach. For example, compared to SOTA, we achieve up to 19$\times$ speedup and 77.1% memory savings on ImageNet classification with only 0.1% accuracy loss. Finally, our method is easy to implement and deploy; over 20$\times$ speedup and 90% energy savings have been observed compared to highly optimized baselines in MKLDNN and CUDNN on NVIDIA Jetson Nano. Consequently, our approach opens up a new direction of research with a huge potential for on-device training.
翻译:尽管对联谊学习、不断学习和许多其他应用十分重要,但在线培训仍然是EgeAI的未决问题,问题在于:在培训期间,后再分析算法需要大量的操作(如浮动点乘数和添加)和记忆消耗(如浮点乘数和添加),因此,在本文件中,我们建议采用新的梯度过滤方法,在设计DNNN模型培训中提供新的梯度过滤方法。更准确地说,我们的方法在梯度地图中创建了一个特殊结构,其独特的元素较少,从而大大减少了培训中回传的计算复杂性和记忆消耗量。关于图像分类和语义分解与多种DNNNN模型(如移动网、深LabV3、UPerNet)和各种装置(如Raspberry Pi和Jetson Nano)的广泛操作。因此,我们的方法显示了我们的方法的有效性和广泛适用性。例如,与SOTA相比,我们在图像网络分类中实现了高达19美元的速度和77.1 %的记忆节省率,准确性损失。最后,我们的方法是易于执行和部署和部署的多数字值DRMUDA的升级,在高水平上观察到了90的节能和高水平上。