The rectified linear unit (ReLU) is a highly successful activation function in neural networks as it allows networks to easily obtain sparse representations, which reduces overfitting in overparameterized networks. However, in network pruning, we find that the sparsity introduced by ReLU, which we quantify by a term called dynamic dead neuron rate (DNR), is not beneficial for the pruned network. Interestingly, the more the network is pruned, the smaller the dynamic DNR becomes during optimization. This motivates us to propose a method to explicitly reduce the dynamic DNR for the pruned network, i.e., de-sparsify the network. We refer to our method as Activating-while-Pruning (AP). We note that AP does not function as a stand-alone method, as it does not evaluate the importance of weights. Instead, it works in tandem with existing pruning methods and aims to improve their performance by selective activation of nodes to reduce the dynamic DNR. We conduct extensive experiments using popular networks (e.g., ResNet, VGG) via two classical and three state-of-the-art pruning methods. The experimental results on public datasets (e.g., CIFAR-10/100) suggest that AP works well with existing pruning methods and improves the performance by 3% - 4%. For larger scale datasets (e.g., ImageNet) and state-of-the-art networks (e.g., vision transformer), we observe an improvement of 2% - 3% with AP as opposed to without. Lastly, we conduct an ablation study to examine the effectiveness of the components comprising AP.
翻译:纠正线性单元( ReLU) 是神经网络中非常成功的激活功能, 因为它可以让网络更容易地获得稀疏的表达形式, 从而减少过度参数化网络的过度配置。 然而, 在网络运行中, 我们发现, 由我们用一个称为动态死神经元率( DNR) 的术语量化的 ReLU 引入的松散性对修剪网络没有好处。 有趣的是, 网络越是修剪, 动态 DNR就越小, 最优化时, 这促使我们提出一种方法, 以明确减少被修剪的网络的动态 DNR, 也就是说, 将网络脱分解。 我们称我们的方法为“ 激活- 运行时速化” (AP) 。 我们注意到, AP 不作为独立的方法运行, 因为它没有评估重量的重要性。 相反, 它与现有的裁剪方法同步, 目的是通过选择性地激活节点来改进它们的性能 DNR。 我们用大众网络( 例如, ResNet, VGG) 进行广泛的实验, 而不是通过两个古典化和三州级数据运行方法( prest- broad) road- broad 4) 的运行的运行的运行的绩效方法, 。