Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, to assist explainable sparse training, we propose important weights Exploitation and coverage Exploration to characterize Dynamic Sparse Training (DST-EE), and provide quantitative analysis of these two metrics. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98\% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy improvement compared to SOTA sparse training methods.
翻译:深度神经网络(DNNs)的超度测量显示,许多应用的预测准确度很高,虽然效果有效,但大量参数的众多参数妨碍了其在资源有限的装置上的受欢迎度,并对环境产生了超大的影响。粗糙的培训(在每迭中使用固定数量的非零加权数)可以通过缩小模型规模,大大降低培训成本;然而,现有稀少的培训方法主要使用随机或贪婪的下降和成长战略,导致当地最低和低精确度。在这项工作中,为了协助可解释的稀少培训,我们建议进行重要的权重开发和覆盖面探索,以说明动态松散培训的特点(DST-EE),并对这两个指标进行定量分析。我们进一步设计一种获取功能,为拟议方法提供理论保障,并澄清其趋同性。实验结果表明,我们拟议方法的稀少模型(高达98 ⁇ )超越了SOTA的深度培训方法,从而导致各种深层次学习任务。在VGG-19/CIFAR-100、ResNet-50/CIFAR-10、ResNet-50/CIFAR-100Srent AS AS-RAS-IOM AS-R-IGRA-IAR-IAR-IAR-IAR-IAR-IAR-IAR-IGRAS-IGLA 方法比起来的精确度改进了我们的拟议方法。