In this paper, we study the importance of pruning in Deep Networks (DNs) and motivate it based on the current absence of data aware weight initialization. Current DN initializations, focusing primarily at maintaining first order statistics of the feature maps through depth, force practitioners to overparametrize a model in order to reach high performances. This overparametrization can then be pruned a posteriori, leading to a phenomenon known as "winning tickets". However, the pruning literature still relies on empirical investigations, lacking a theoretical understanding of (1) how pruning affects the decision boundary, (2) how to interpret pruning, (3) how to design principled pruning techniques, and (4) how to theoretically study pruning. To tackle those questions, we propose to employ recent advances in the theoretical analysis of Continuous Piecewise Affine (CPA) DNs. From this viewpoint, we can study the DNs' input space partitioning and detect the early-bird (EB) phenomenon, guide practitioners by identifying when to stop the first training step, provide interpretability into current pruning techniques, and develop a principled pruning criteria towards efficient DN training. Finally, we conduct extensive experiments to show the effectiveness of the proposed spline pruning criteria in terms of both layerwise and global pruning over state-of-the-art pruning methods.
翻译:在本文中,我们研究了深网络(DNs)中裁剪的重要性,并根据目前缺乏数据了解重量的初始化来激励它。目前的DN初始化,主要侧重于通过深度维持地貌地图的第一顺序统计,迫使从业者过度平衡模型,以达到高性能。然后,这种过度平衡化可以事后处理,导致被称为“获奖票”的现象。然而,修剪文献仍然依赖于经验性调查,缺乏理论上的理解:(1) 修剪如何影响决定界限,(2) 如何解释裁剪,(3) 如何设计原则性裁剪裁技术,(4) 如何理论性裁剪。为了解决这些问题,我们提议利用对“持续小松(CPA)”DNs进行理论分析的最新进展。从这个角度看,我们可以研究DNs输入空间的分隔和探测早期鸟(EB)现象,指导从何时停止第一个培训步骤,提供对当前裁剪裁剪技术的解释,(3) 如何设计原则性裁剪裁剪裁技术,以及(4) 如何在理论性研究方面进行理论性裁剪裁技术。为了解决这些问题,我们建议采用最新理论性标准,以展示高效的D级标准,以超前。