Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training, this current work reveals the importance of sparsity schemes on the performance of sparse training in terms of accuracy as well as training speed on real edge devices. On top of that, the paper proposes to employ data efficiency for further acceleration of sparse training. Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices. Comparing with state-of-the-art (SOTA) works on accuracy, our MEST increases Top-1 accuracy significantly on ImageNet when using the same unstructured sparsity scheme. Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works. A reviewer strongly against our work based on his false assumptions and misunderstandings. On top of the previous submission, we employ data efficiency for further acceleration of sparse training. And we explore the impact of model sparsity, sparsity schemes, and sparse training algorithms on the number of removable training examples. Our codes are publicly available at: https://github.com/boone891214/MEST.
翻译:最近,出现了探索加速神经网络培训的广度的新趋势,包括了边缘培训的范式。本文件提出了一个新的记忆-经济松散培训(MEST)框架,目标是精确和快速执行边缘装置。拟议的MEST框架包括Elistic Multation(EM)和Soft Memory Bound(SS)的增强,以确保高摄氏度比率的高度准确性。与现有的稀疏培训工程不同,目前的工作表明,在精度和真实边缘装置的培训速度方面缺乏培训的普及性计划十分重要。此外,本文件还提议利用数据效率为进一步加快稀释培训的速度。我们的结果表明,即使在对稀薄训练过程中的防暴动面具进行动态探索时,也可以发现无法忘的范例,因此可以在边端装置上进行进一步培训的速度加快。与最新技术模型(SOST)的精确性工作相比,我们的MEST在使用相同的不结构化的节流速培训计划时,在图像网络上增加了TOP-1的精度精确性。 系统化培训框架, 持续地评估了我们的准确性, 和MESTAF的深度分析, 以及我们先前的深度培训工作。