Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step. Specifically, we utilize an adaptively weighted $\ell^1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks. The adaptive weighting we introduce corresponds to a novel regularizer based on the logarithm of the absolute value of the weights. We perform a series of ablation studies demonstrating the improvement provided by the adaptive weighting and generalized RDA algorithm. Furthermore, numerical experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that our method 1) trains sparser, more accurate networks than existing state-of-the-art methods; 2) can be used to train sparse networks from scratch, i.e. from a random initialization, as opposed to initializing with a well-trained base model; 3) acts as an effective regularizer, improving generalization accuracy.
翻译:抑制神经网络的重量是减少模型大小和推断复杂性的一种有效而广泛使用的技术。我们开发和测试一种基于压缩遥感的新颖方法,该方法将修剪和培训合并为一个步骤。具体地说,我们在培训期间对重量使用一种适应性加权的1美元罚款,我们将其与常规的双均分算法的概括化结合起来,以便培训稀薄的神经网络。我们引入的适应性加权法与基于这些重量绝对值对数的新型正规化法相对应。我们进行了一系列的减缩研究,展示了适应性加权法和通用RDA算法所提供的改进。此外,在CIFAR-10、CIFAR-100和图像网络数据集上进行的数字实验表明,我们的方法1)培训了稀疏者,比现有最新方法更精确的网络;2)我们采用的适应性加权法可以用来从零开始,即随机初始化,而不是以经过良好训练的基础模型初始化;3)作为有效的常规化者,提高一般化的准确性。