When training neural networks, dying neurons -- units becoming inactive or saturated -- are traditionally seen as harmful. This paper sheds new light on this phenomenon. By exploring the impact of various hyperparameter configurations on dying neurons during training, we gather insights on how to improve upon sparse training approaches to pruning. We introduce Demon Pruning (DemP), a method that controls the proliferation of dead neurons through a combination of noise injection on active units and a one-cycle schedule regularization strategy, dynamically leading to network sparsity. Experiments on CIFAR-10 and ImageNet datasets demonstrate that DemP outperforms existing dense-to-sparse structured pruning methods, achieving better accuracy-sparsity tradeoffs and accelerating training by up to 3.56$\times$. These findings provide a novel perspective on dying neurons as a resource for efficient model compression and optimization.
翻译:在训练神经网络时,死亡神经元——即变得不活跃或饱和的单元——传统上被视为有害。本文为这一现象提供了新的视角。通过探究不同超参数配置对训练过程中死亡神经元的影响,我们获得了关于如何改进稀疏训练剪枝方法的洞见。我们提出了Demon剪枝(DemP)方法,该方法通过在活跃单元上注入噪声并结合单周期调度正则化策略来控制死亡神经元的增殖,从而动态地引导网络实现稀疏化。在CIFAR-10和ImageNet数据集上的实验表明,DemP优于现有的稠密到稀疏结构化剪枝方法,实现了更优的精度-稀疏度权衡,并将训练速度最高提升至3.56倍。这些发现为将死亡神经元视为高效模型压缩与优化的资源提供了新颖的视角。