Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98\% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy improvement compared to SOTA sparse training methods.
翻译:深度神经网络(DNNs)的超度测量显示,许多应用的预测准确度很高,尽管效果有效,但大量参数的众多参数妨碍了其受资源有限的装置的欢迎程度,并对环境产生了超大的影响。粗糙的培训(在每迭中使用固定数量的非零重量)可以通过缩小模型规模,大大降低培训成本。然而,现有的稀少的培训方法主要使用随机或贪婪的滴滴滴式战略,导致当地最低和低精确度。在这项工作中,我们认为动态的稀少培训是一个稀少的连通搜索问题,设计了一个探索和勘探获取功能,以逃避当地Opima和垫接点。我们进一步设计了一个获取功能,为拟议方法提供理论保障,并澄清其趋同性。实验结果表明,我们拟议方法获得的稀少模式(高达98 ⁇ 宽度)超过了SOTA的分散培训方法,在各种深层次的学习任务方面,使SOTA少的训练方法更加完美。关于VGG-19/CIFAR-100,ResNet-50/CIFAR-100,我们的方法网-50/CIFAR-100,我们的方法比ROFAR-AR-100,我们的拟议方法的精确性方法比QAR-RO50/SORION-AR-LOLODAS-IAR-IAR-LA模型的精确度也比高。