Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization based methods lack transparent tradeoff hyperparameter selection to realize a computational budget. Our contribution is two-fold: 1) decoupled task and pruning losses. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. Inspired by the Hebbian theory in Neuroscience: "neurons that fire together wire together", we propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood for each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve lower drop in accuracy with up to 13% improvement in FLOPs reduction.
翻译:动态模型运行是一个最新方向, 允许在部署期间对每个输入样本进行不同的子网络子网络的推断。 然而, 当前动态方法依赖于通过诱导宽度损失来学习一个连续的频道, 从而通过引导宽度损失进行正规化。 这种配方在平衡不同损失( 如任务损失、 正规化损失) 中引入了复杂性。 此外, 以正规化为基础的方法缺乏透明的取舍超参数选择, 以实现计算预算。 我们的贡献是双重的:1) 拆分任务和裁剪损 。 2 简单的超参数选择, 使得 FLOP 在培训前能够对每个输入样本进行削减估计。 受神经科学的赫比亚理论的启发: “ 混合电线的中子系统 ” 我们提议根据前一层的激活来预测一个层处理 k 过滤器的遮罩 。 我们把问题作为自我监督的硬盘分类问题。 每个掩罩预测模块都受过培训, 预测当前层中每个过滤器的日志相似值是否属于最高- k 增强的过滤器。 。 以动态估计我们每项输入的数值 k 的数值是动态估计的,, 以新的图像标准标准为基础, 达到 IMFRFRFARNet 15 结构中的精确度 。