Many recent works have shown trainability plays a central role in neural network pruning -- unattended broken trainability can lead to severe under-performance and unintentionally amplify the effect of retraining learning rate, resulting in biased (or even misinterpreted) benchmark results. This paper introduces trainability preserving pruning (TPP), a scalable method to preserve network trainability against pruning, aiming for improved pruning performance and being more robust to retraining hyper-parameters (e.g., learning rate). Specifically, we propose to penalize the gram matrix of convolutional filters to decorrelate the pruned filters from the retained filters. In addition to the convolutional layers, per the spirit of preserving the trainability of the whole network, we also propose to regularize the batch normalization parameters (scale and bias). Empirical studies on linear MLP networks show that TPP can perform on par with the oracle trainability recovery scheme. On nonlinear ConvNets (ResNet56/VGG19) on CIFAR10/100, TPP outperforms the other counterpart approaches by an obvious margin. Moreover, results on ImageNet-1K with ResNets suggest that TPP consistently performs more favorably against other top-performing structured pruning approaches. Code: https://github.com/MingSun-Tse/TPP.
翻译:近期许多工程显示,可训练性在神经网络修剪中发挥着核心作用 -- -- 无人看管的破损训练可能导致严重表现不佳,无意地扩大再培训学习率的影响,导致偏差(甚至被误解)基准结果。本文介绍了可训练性保留修剪(TPP),这是保护网络修剪(TPP)的可扩缩方法,是保护网络修剪的可调剪能力的一种可扩缩方法,目的是改进修剪性能,并且更有力地再训练超临界参数(例如,学习率)。具体地说,我们提议惩罚CIRA10/100上的Conval过滤器格格格矩阵,以调整保留过滤器的经修剪的过滤器。除了革命层外,根据保护整个网络可训练性的精神,我们还提议规范分批标准化参数(TPPP)(比例和偏差)。关于线性MPPPP(ResNet)网络可以与电压恢复计划(Resnet56/VGG19)相当。关于CIFAR10/100的非线性CTP(ResNet-TP)比其他对应方法更明显地优于TROPM/MDM) 。此外, ASOD- AS- ASOD- ASOD-ILOD-IDOD-ID-ID-PD-ID-FDFDFDFD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SODODOLVVVVDODOD-S-S-S-S-S-S-S-PDOD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IPDOFGD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S</s>