Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning's effect on generalization empirically. We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning's impact on generalization.
翻译:执业者经常发现,修剪的模型会改善模型的概括化。基于偏差取舍的长期假设是这种一般化的改进,可以降低模型的大小。然而,最近关于超度参数化的研究是新的模型规模制度的特点,在这种模式中,较大的模型可以实现更好的概括化。在这个过度分解的系统中,预留模型会导致矛盾 -- -- 而理论预测,降低模型规模会损害一般化,向一系列的孔隙运行,但这种偏差会改善它。受这一矛盾的驱使,我们重新审视对一般化的经验性效果。我们表明,缩小规模不能充分说明标准裁剪算法的普遍化-改进效果。相反,我们发现,裁剪会导致在特定的偏狭度上进行更好的培训,使培训损失在密度模型上得到改进。我们发现,截断还会导致其他孔隙进一步规范化,从而降低因密度模型的噪音而导致的精确性退化。普林金延长模型培训时间,缩小模型规模。这两个因素可以改进培训,并增加正规化。我们从经验上证明,这两种因素对于全面解释“偏差”的影响至关重要。