Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. In this paper, we focus on sparse training and highlight a perhaps counter-intuitive finding, that random pruning at initialization can be quite powerful for the sparse training of modern neural networks. Without any delicate pruning criteria or carefully pursued sparsity structures, we empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. There are two key factors that contribute to this revival: (i) the network sizes matter: as the original dense networks grow wider and deeper, the performance of training a randomly pruned sparse network will quickly grow to matching that of its dense equivalent, even at high sparsity ratios; (ii) appropriate layer-wise sparsity ratios can be pre-chosen for sparse training, which shows to be another important performance booster. Simple as it looks, a randomly pruned subnetwork of Wide ResNet-50 can be sparsely trained to outperforming a dense Wide ResNet-50, on ImageNet. We also observed such randomly pruned networks outperform dense counterparts in other favorable aspects, such as out-of-distribution detection, uncertainty estimation, and adversarial robustness. Overall, our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning. Our source code can be found at https://github.com/VITA-Group/Random_Pruning.
翻译:随机剪裁可能是在神经网络中达到超度的最天真的方法,但通过培训后裁剪或零散培训,我们被视作不具有竞争力。在本文中,我们侧重于零散培训,强调一个也许反直观的发现,即初始化随机剪裁对于现代神经网络的稀少培训来说可能非常强大。没有任何微妙的剪裁标准或仔细追求的弥漫结构,我们从零到零的随机剪裁网络培训可以与其密度等同的性能相匹配。有两种关键因素有助于这种复苏:(一) 网络的规模很重要:(一) 随着原始密度网络的扩大和深度的扩大,因此随机剪裁剪裁的网络的性能将迅速增长到与其密度相等的同等程度,甚至达到高温度比率;(二) 适当的层错位偏差比率可以作为稀释培训的预选前,这显示了另一个重要的性能增强力。看起来简单,宽度的ResNet-50的源码子网络可以进行深度的训练,从而在更深层的服务器上显示超度检测。