Several recent works empirically find finetuning learning rate is critical to the final performance in neural network structured pruning. Further researches find that the network trainability broken by pruning answers for it, thus calling for an urgent need to recover trainability before finetuning. Existing attempts propose to exploit weight orthogonalization to achieve dynamical isometry for improved trainability. However, they only work for linear MLP networks. How to develop a filter pruning method that maintains or recovers trainability and is scalable to modern deep networks remains elusive. In this paper, we present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification. Specifically, TPP regularizes the gram matrix of convolutional kernels so as to de-correlate the pruned filters from the kept filters. Beside the convolutional layers, we also propose to regularize the BN parameters for better preserving trainability. Empirically, TPP can compete with the ground-truth dynamical isometry recovery method on linear MLP networks. On non-linear networks (ResNet56/VGG19, CIFAR datasets), it outperforms the other counterpart solutions by a large margin. Moreover, TPP can also work effectively with modern deep networks (ResNets) on ImageNet, delivering encouraging performance in comparison to many top-performing filter pruning methods. To our best knowledge, this is the first approach that effectively maintains trainability during pruning for the large-scale deep neural networks.
翻译:最近的一些工作在经验上发现微调学习率对于神经网络结构化修剪的最后性能至关重要。 进一步的研究发现, 网络的可训练性通过修剪答案打破了网络的可训练性, 从而要求迫切需要在微调前恢复可训练性。 现有的尝试提议利用重量正向内核化来达到动态的偏差度。 但是, 它们只能用于线性 MLP 网络。 如何开发一个维持或恢复可训练性并能向现代深度网络扩展的过滤性修剪方法仍然难以实现。 在本文中, 我们展示了可训练性保存双向运行( TP), 这是一种基于正规化的、 结构化的内核网络结构化的可调试操作方法。 具体地说, TPP 将正向内核电图化的可训练性能分析性能, 在直线性磁力型网络中, 以非直线性能性能性能性能模型 MIPRP 网络, 也以非直线性能性能性能分析法 。