We explore a new perspective on adapting the learning rate (LR) schedule to improve the performance of the ReLU-based network as it is iteratively pruned. Our work and contribution consist of four parts: (i) We find that, as the ReLU-based network is iteratively pruned, the distribution of weight gradients tends to become narrower. This leads to the finding that as the network becomes more sparse, a larger value of LR should be used to train the pruned network. (ii) Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc) which adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max_lr) in an S-shape as the network is iteratively pruned.We highlight that S-Cyc is a method agnostic LR schedule that applies to many iterative pruning methods. (iii) We evaluate the performance of the proposed S-Cyc and compare it to four LR schedule benchmarks. Our experimental results on three state-of-the-art networks (e.g., VGG-19, ResNet-20, ResNet-50) and two popular datasets (e.g., CIFAR-10, ImageNet-200) demonstrate that S-Cyc consistently outperforms the best performing benchmark with an improvement of 2.1% - 3.4%, without substantial increase in complexity. (iv) We evaluate S-Cyc against an oracle and show that S-Cyc achieves comparable performance to the oracle, which carefully tunes max_lr via grid search.
翻译:我们探索如何调整学习率(LR)时间表的新视角,以改善基于RELU的网络的迭接式运行。我们的工作和贡献由四个部分组成:(一) 我们发现,由于基于RELU的网络是迭接式运行,重量梯度的分布往往越来越窄。这导致发现,随着网络越来越稀少,应当使用更大的LR值来培训经调整的网络。 (二) 受这一发现激励,我们提出了一个新的LR时间表,称为S-Cy-20C(S-Cyc),它通过逐步增加S-shapedi的LR上界(max_lr),来调整常规周期LR时间表。我们强调,S-C是一种方法的显微缩LRR,S-C和S-C的S-S-C(S-S-S-C)MLS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Servi-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S