Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization-based methods lack transparent tradeoff hyperparameter selection to realize computational budget. Our contribution is twofold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. We propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood of each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve a lower drop in accuracy with up to 13% improvement in FLOPs reduction.
翻译:动态模型运行是一个最新方向,它允许在部署期间对每个输入样本进行不同的子网络子网络的推断。 然而,当前动态方法依赖于通过诱导夸大损失来学习一个连续的通道,通过随机化进行正规化。 这种配方在平衡不同损失(例如任务丢失、正规化损失)中引入了复杂性。 此外,基于正规化的方法缺乏透明的取舍超参数选择来实现计算预算。 我们的贡献是双重的:1) 拆解任务和运行培训。 2 简单的超参数选择,使得 FLOP 能够在培训前进行削减估计。 我们提议根据前一层的激活来预测一个层中处理 k 过滤器的遮罩。 我们作为自我监督的二进制分类问题提出这一问题。 每一个基于正规化方法( 如任务丢失、正规化、正规化), 每个基于正规化的过滤器的对当前层中每个过滤器的日志进行预测, 是否属于顶级的激活过滤器 。 我们用热测图的质量来动态估计每项输入的价值。 我们演示了几个神经结构结构, 如 VG、 ResONet 和移动网络 24 FARL 和图像网络 降低 的系统 15 的系统, 我们的精确 将达到 CIFAR 。