During the last decade, deep convolutional networks have become the reference for many machine learning tasks, especially in computer vision. However, large computational needs make them hard to deploy on resource-constrained hardware. Pruning has emerged as a standard way to compress such large networks. Yet, the severe perturbation caused by most pruning approaches is thought to hinder their efficacy. Drawing inspiration from Lagrangian Smoothing, we introduce a new technique, Selective Weight Decay (SWD), which achieves continuous pruning throughout training. Our approach deviates significantly from most methods of the literature as it relies on a principle that can be applied in many different ways, for any problem, network or pruning structure. We show that SWD compares favorably to other approaches in terms of performance/parameters ratio on the CIFAR-10 and ImageNet ILSVRC2012 datasets. On CIFAR-10 and unstructured pruning, with a parameters target of 0.1%, SWD attains a Top-1 accuracy of 81.32% while the reference method only reaches 27.78%. On CIFAR-10 and structured pruning, with a parameters target of 2.5%, the reference technique drops at 10% (random guess) while SWD maintains the Top-1 accuracy at 93.22%. On the ImageNet ILSVRC2012 dataset with unstructured pruning, for a parameters targer of 2.5%, SWD attains 84.6% Top-5 accuracy instead of the 77.07% reached by the reference.
翻译:在过去十年中,深层革命网络已经成为许多机器学习任务的参考,特别是在计算机视觉方面。然而,大量的计算需求使得它们难以在资源限制的硬件上部署。节制已成为压缩大型网络的标准方法。然而,大多数修剪方法造成的严重扰动被认为妨碍了它们的效力。从拉格朗吉平滑的启发中,我们引入了一种新的技术,即选择性轻度脱色(SWD),在整个培训过程中连续进行调整。我们的方法与大多数文献方法大不相同,因为它依赖于一个可以在许多不同方式上应用的原则,用于任何问题、网络或运行结构。我们显示,SWD在业绩/参数/参数比率方面优于CIFAR-10和图像Net ILSVRC-122012数据集中的其他方法。在CIFAR-10和无结构化的运行中,参数为0.1%,SWD在参考方法中只能达到81.32%的顶端点。在CIFAR-10和结构化的PRC-1%的精确度上,在SFAR-1%的水平上,在S-1%的数据中,在S-1%的顶部图像结构中,在SDFAR-1%的精确参数上维持了2.5的目标参数。