We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network during the early stages of training. Thus, the computational cost of subsequent training iterations, besides that of inference, is considerably reduced. Our method, based on variational inference principles using Gaussian scale mixture priors on neural network weights, learns the variational posterior distribution of Bernoulli random variables multiplying the units/filters similarly to adaptive dropout. Our algorithm, ensures that the Bernoulli parameters practically converge to either 0 or 1, establishing a deterministic final network. We analytically derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection and leads to consistent pruning levels and prediction accuracy regardless of weight initialization or the size of the starting network. We prove the convergence properties of our algorithm establishing theoretical and practical pruning conditions. We evaluate the proposed algorithm on the MNIST and CIFAR-10 data sets and the commonly used fully connected and convolutional LeNet and VGG16 architectures. The simulations show that our method achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.
翻译:我们提出同时学习和修剪算法,能够在培训的早期阶段在神经网络中识别和消除不相关的结构。因此,除了推断外,随后的培训迭代的计算成本也大大降低。我们的方法基于使用高山比例混合物的变数推理原理,在神经网络重量上采用神经网络重量比重,学习Bernoulli随机变量的变数后继体分布,使单元/过滤器与适应性辍学相类似的倍增。我们的算法确保Bernoulli参数实际上与0或1相融合,建立一个确定性的最后网络。我们分析地得出了比以前参数新的超前一级分布,这些参数对于优化选择至关重要,并导致一致的调整水平和预测准确性,而不论重量初始初始初始或初始网络的大小如何。我们证明了我们的算法的趋同性,确定了理论和实践性调整条件。我们评估了MNIICT和CIFAR-10数据集的拟议算法,以及常用的完全连接和革命性LeNet和VGG16结构。我们的分析表明,我们的方法在初始测试阶段达到更稳健的网络水平,同时在结构上保持更稳健的初始和测试水平。