Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higher-order effects of pruning on the loss and the optimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
翻译:初始化时,我们发现保持原始网络准确性、同时消耗较少的计算培训和推算资源的模式稀少,然而,目前的方法不足以优化这种优化,导致模型性能的大幅退化。在本文件中,我们确定目前方法的制定存在一个根本性的局限性,即其突出标准在培训开始时只看一个步骤,而没有考虑到网络的可训练性。虽然模拟和逐步显示可以改进运行性能,但是在计算突出标准时,迄今没有明确考虑将紧接运行后的培训阶段。为了克服现有方法的短视性,我们提议Prospect Prutning(Prospirning)(ProsprPrrr),该方法通过最初几个优化步骤使用元化方法来确定对普鲁纳的权重。ProsPr(ProsPr)将估计对损失的测序和优化轨迹的测算结果结合起来,以便确定可训练的子网络。我们的方法在对现有各种视觉性定型任务中,在比较一次直观性的工作上,以较慢的数据和较慢的原始的方法取得了状态。