Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. Code used in our work can be found in github.com/google-research/rigl.
翻译:由于空间或推断时间的限制,许多应用需要稀疏的神经网络。在培训密集网络以产生稀疏的推断网络方面有大量工作,但将最大的可训练的稀少模型的大小限制在最大可训练的密集模型的大小。在本文件中,我们采用了一种方法来培训稀散的神经网络,其参数计数固定,在整个培训过程中固定计算成本固定,同时不牺牲与现有密集至扭曲的培训方法相比的准确性。我们的方法是利用参数大小和不常见的梯度计算来更新培训过程中稀疏网络的地形。我们表明,与以往的技术相比,这一方法需要较少的浮点操作(浮点操作)才能达到一定的准确度。我们的工作守则可以在Githribub.com/go-graleSearch中找到。