Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the ability to give the weights the freedom to evolve smoothly across the zero state while progressively increasing the sparsity ratio. Source code and weights available at https://github.com/vanderschuea/stthree
翻译:在培训神经网络有助于降低计算复杂性时,将重量转换为零。为了逐步提高网络的宽度比率,同时又不引起培训过程中的急剧重量不连续,我们的工作结合软峰值和直通梯度估计来更新原始值,即非临界值,零重数的版本。我们称为ST-3的直通/软临界值/sparse-training的方法在精确度/分性和准确性/FLOPS交易量方面都取得了SoA结果,同时逐步提高单一培训周期的宽度比率。特别是,尽管ST-3简单,但它与最新方法相比更为有利,采用了不同的配方或生物刺激神经再生原则。这表明,有效宽度的关键要素主要在于能否使重量在零州之间顺利发展,同时逐步增加紧张率。https://github.com/vanderschea/st 3 提供的源代码和重量。