Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization based methods lack transparent tradeoff hyperparameter selection to realize computational budget. Our contribution is two-fold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. Inspired by the Hebbian theory in Neuroscience: "neurons that fire together wire together", we propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood for each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve lower drop in accuracy with up to 13% improvement in FLOPs reduction.
翻译:动态模型运行是一个最新方向, 允许在部署期间对每个输入样本进行不同的子网络子网络的推断。 然而, 当前动态方法依赖于通过诱导宽度损失来学习一个连续的通道, 从而通过引导宽度损失进行正规化。 这种配方在平衡不同损失( 如任务损失、 正规化损失) 中引入了复杂性。 此外, 基于正规化的方法缺乏透明的取舍超参数选择来实现计算预算。 我们的贡献有两重:1) 拆解任务和运行培训。 2 简单的超参数选择使得 FLOP 能够在培训前进行减排估计。 在神经科学的赫比亚理论的启发下: “ 联合电线的中子” 我们提议根据前一层的激活, 预测一个层中处理 k 过滤器的遮罩。 我们把问题作为一个自我监督的二元分类问题。 每个掩码预测模块都受过培训, 以预测当前层中每个过滤器的日志相似值是否属于顶级增强的过滤器。 以动态估计每次输入的数值 k, 以新的图像标准为基础, 将达到 commabs commabs 。 在 IMFFFARNet 中, 15 中, 我们将一些数据实验中以SODRAFRALA 的缩缩缩缩缩缩算中, 。