As the optimization problem of pruning a neural network is nonconvex and the strategies are only guaranteed to find local solutions, a good initialization becomes paramount. To this end, we present the Amenable Sparse Network Investigator ASNI algorithm that learns a sparse network whose initialization is compressed. The learned sparse structure found by ASNI is amenable since its corresponding initialization, which is also learned by ASNI, consists of only 2L numbers, where L is the number of layers. Requiring just a few numbers for parameter initialization of the learned sparse network makes the sparse network amenable. The learned initialization set consists of L signed pairs that act as the centroids of parameter values of each layer. These centroids are learned by the ASNI algorithm after only one single round of training. We experimentally show that the learned centroids are sufficient to initialize the nonzero parameters of the learned sparse structure in order to achieve approximately the accuracy of non-sparse network. We also empirically show that in order to learn the centroids, one needs to prune the network globally and gradually. Hence, for parameter pruning we propose a novel strategy based on a sigmoid function that specifies the sparsity percentage across the network globally. Then, pruning is done magnitude-wise and after each epoch of training. We have performed a series of experiments utilizing networks such as ResNets, VGG-style, small convolutional, and fully connected ones on ImageNet, CIFAR10, and MNIST datasets.
翻译:由于运行神经网络的最优化问题是非康维克斯,而且战略只能保证找到本地解决方案,良好的初始化就变得至关重要。 为此,我们展示了“不毛不毛的网络” Asni 算法,该算法学习了一个小的网络,其初始化过程压缩了。ASNI 发现的知识稀疏的结构自其相应的初始化以来就可操作,该算法也由ASNI 所学到的只有2L 数字,其中L 是多层数。重新要求为所学的稀疏网络的参数初始化提供几个数字,使网络变得稀疏。所学的初始化套件由L 签字的两对夫妇组成,作为每个层参数值的中值的中位数。因此,在仅仅进行一轮培训之后,ASSNI 的算法就学了这些分母体。我们实验显示,所学的分数足以初始化的稀松散结构的非零星参数,以便实现非扭曲网络的精确性。我们还从经验上表明,为了学习中提取网络,需要全球和逐步地利用网络的精度。 因此,我们用一个小的精度的精细的精度网络的精度,在SMA网络上,我们提出了一个新的网络的精度的精度,在SMA系列中, 的精度的精度,在每一个的精度上,在每一级的精度上,在每一个级的精度上,我们所运行的精细的精度的精细的精细的精细的精细的精细的精度上,在S。