Despite impressive performance on a wide variety of tasks, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these costs, however, the sparsity constraints add difficulty to the optimization, resulting in an increase in training time and instability. In this work, we aim to overcome this problem and achieve space-time co-efficiency. To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method. Specifically, we approximate the correlation between the current and previous gradients, which is used to balance the two gradients to obtain a corrected gradient. Our method can be used with most popular sparse training pipelines under both standard and adversarial setups. Theoretically, we prove that our method can accelerate the convergence rate of sparse training. Extensive experiments on multiple datasets, model architectures, and sparsities demonstrate that our method outperforms leading sparse training methods by up to \textbf{5.0\%} in accuracy given the same number of training epochs, and reduces the number of training epochs by up to \textbf{52.1\%} to achieve the same accuracy.
翻译:尽管在各种各样的任务上取得了令人印象深刻的业绩,但深神经网络需要大量的记忆和计算成本,禁止将其应用于资源紧张的情景中。粗糙的培训是降低这些费用的最常见技术之一,然而,紧张性的限制增加了优化的难度,导致培训时间的增加和不稳定性。在这项工作中,我们的目标是克服这一问题,实现时空共同效率。为了加快和稳定稀疏培训的趋同,我们分析梯度变化,并开发一种适应性梯度校正方法。具体地说,我们比较了当前和以往梯度之间的相互关系,用于平衡两个梯度,以获得修正梯度。我们的方法可以在标准设置和对立设置下与最受欢迎的稀少培训管道一起使用。理论上,我们证明我们的方法可以加快稀散培训的趋同率。在多个数据集、模型结构以及宽度上进行的广泛实验表明,我们的方法超越了有限的培训方法,在培训的精度方面达到\ textf{5.0>。我们的方法在培训的精确度上减少了培训的精度,在相同的不同程度上减少了培训的精度。