Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed to prune trained models. Despite lacking justification for their use early-on in training, such measures result in surprisingly low accuracy loss. To better explain this behavior, we develop a general framework that uses gradient flow to unify state-of-the-art importance measures through the norm of model parameters. We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several results related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is therefore appropriate for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, such that increasing gradient norm via pruning can produce poorly performing models. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10/CIFAR-100. Code available at https://github.com/EkdeepSLubana/flowandprune.
翻译:最近的网络修剪方法侧重于培训中的早期修剪模型。为了估计去除参数的影响,这些方法使用了最初设计用于模拟培训模型的重要措施。尽管缺乏在培训中早期使用的理由,但这类措施导致的精度损失极低。为了更好地解释这一行为,我们开发了一个总框架,使用梯度流来通过模型参数规范统一最先进的重要措施。我们使用这个框架来确定裁剪措施与模型参数演变之间的关系,建立与培训早期修剪模型有关的若干结果:(一) 基于规模的修剪方法删除了对减少损失作用最小的参数,导致模型的趋同速度快于规模分析方法;(二) 基于修剪裁的损防护,保存了一阶模型演变动态,因此适合通过最起码的训练模型运行;(三) 基于梯度的修剪裁影响到第二阶梯度模型演变动态,因此,通过修剪裁方法增加梯度规范可以产生不良的模型。我们在几个VGG-13、MLFAR-Net-V1、IMFAR-FAR-FAR-C-C-CSLADSLAD/RAD/RSAR/RCNCAD/RADRAD/R510、AS-RADRAD/R-RSOL/RSAR/RDRDRSAR/ISDRDR/ISAR/ISDRDRDR/IS/IS/IS/IS/RCRAS/IS/IS/IS/IS/RC-0/RCRAS-0/IS/ISCRCRC-0/IS/IS/ISCRDAR/IS/IS/IS/IS/IS/IS/IS/IS-0/AS-0/IS/IS/AS-0/AS-0/AS-0/AS-0/AS-0/AS-0/IS-0/IS-0/AS-0/IS/IS/IS/IS/AS-0/IS/IS/ISC-0/IS/ISC-0/ISMAR-0/IS/ISAR/SSAR/RAS-01和Res)。