Training deep neural networks with an $L_0$ regularization is one of the prominent approaches for network pruning or sparsification. The method prunes the network during training by encouraging weights to become exactly zero. However, recent work of Gale et al. reveals that although this method yields high compression rates on smaller datasets, it performs inconsistently on large-scale learning tasks, such as ResNet50 on ImageNet. We analyze this phenomenon through the lens of variational inference and find that it is likely due to the independent modeling of binary gates, the mean-field approximation, which is known in Bayesian statistics for its poor performance due to the crude approximation. To mitigate this deficiency, we propose a dependency modeling of binary gates, which can be modeled effectively as a multi-layer perceptron (MLP). We term our algorithm Dep-$L_0$ as it prunes networks via a dependency-enabled $L_0$ regularization. Extensive experiments on CIFAR10, CIFAR100 and ImageNet with VGG16, ResNet50, ResNet56 show that our Dep-$L_0$ outperforms the original $L_0$-HC algorithm of Louizos et al. by a significant margin, especially on ImageNet. Compared with the state-of-the-arts network sparsification algorithms, our dependency modeling makes the $L_0$-based sparsification once again very competitive on large-scale learning tasks. Our source code is available at https://github.com/leo-yangli/dep-l0.
翻译:以 $L_0美元 正规化的深度神经网络培训深度神经网络是网络运行或垃圾化的突出方法之一。 该方法在培训期间通过鼓励重量完全变为零而使网络在培训中变得零。 然而,Gale等人最近的工作表明,虽然该方法在较小的数据集上产生高压缩率,但在大规模学习任务上,例如图像网络上的ResNet50等,该方法的表现却不一致。我们从可变推论的角度分析这一现象,发现它很可能是由于双向门的独立模型,即平均场近似,在Bayesian的统计数据中,由于粗略的逼近,其表现不佳。为了减轻这一缺陷,我们建议对二元门进行依赖性模型化,该方法可以有效地作为多层透视器(MLP)进行模拟。我们称它为Dep-L_0美元,因为它是依靠依赖的 $L_0 正规化。 CiFAR10, CIFAR100 和 VG16 ResNet50 的图像源代码,ResNet56 显示我们的D-ral-alalalalal_alal_alisgistrational_ salislational_ sal_ sal_ sal_ salislational_ sal_ salationslationslationalational_ sal_ salationalationalationslational_legal_legal_ sal_legal_ex_legisional_ sal_ sal_ sal_ sal_ sal_ sal_ salisational_ sal_ sal_ sal_ sal_ sal_ sal_legational_ sal_ sal_ saliscalisalisal_xal_ salisgal_legal_legal_legal_ sal_ sal_ sal_ sal_ sal_ sal_ saliscalisalisalisalisalisalisal_legal_ salismal_ sal_legal_lemental_lemental_sal_lemental_lemental_lemental_lemental_exal_exal_ salisal_ sal_ sal_