We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network during the early stages of training. Thus, the computational cost of subsequent training iterations, besides that of inference, is considerably reduced. Our method, based on variational inference principles, learns the posterior distribution of Bernoulli random variables multiplying the units/filters similarly to adaptive dropout. We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection in a way that the Bernoulli parameters practically converge to either 0 or 1 establishing a deterministic final network. Our algorithm is robust in the sense that it achieves consistent pruning levels and prediction accuracy regardless of weight initialization or the size of the starting network. We provide an analysis of its convergence properties establishing theoretical and practical pruning conditions. We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures. The simulations show that our method achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.
翻译:我们建议同时进行学习和修剪算法,以便在培训的早期阶段确定和消除神经网络中的不相关结构。因此,除了推断外,随后培训迭代的计算成本也大大降低。我们的方法基于变异的推断原理,学习Bernoulli 随机变量的后部分布,使单元/过滤器与适应性辍学相似地相乘。我们从以前的参数中得出新的超位分布,这对最佳选择至关重要,贝努利参数实际上会达到0或1级,建立一个确定性最终网络。我们的算法是稳健的,因为无论重量初始化或起始网络大小,它都达到一致的调整水平和预测准确性。我们分析了其趋同性,确定了理论和实践性调整条件。我们评估了MNIST数据集的拟议算法,并普遍使用完全连接和卷动的LeNet结构。模拟表明,我们的方法在结构初始运行的状态方法上达到了运行水平,同时保持更牢固的初始网络的测试度,并且重要的是,在初始运行中保持更稳健的测试性。