Due to data privacy issues, accelerating networks with tiny training sets has become a critical need in practice. Previous methods achieved promising results empirically by filter-level pruning. In this paper, we both study this problem theoretically and propose an effective algorithm aligning well with our theoretical results. First, we propose the finetune convexity hypothesis to explain why recent few-shot compression algorithms do not suffer from overfitting problems. Based on it, a theory is further established to explain these methods for the first time. Compared to naively finetuning a pruned network, feature mimicking is proved to achieve a lower variance of parameters and hence enjoys easier optimization. With our theoretical conclusions, we claim dropping blocks is a fundamentally superior few-shot compression scheme in terms of more convex optimization and a higher acceleration ratio. To choose which blocks to drop, we propose a new metric, recoverability, to effectively measure the difficulty of recovering the compressed network. Finally, we propose an algorithm named PRACTISE to accelerate networks using only tiny training sets. PRACTISE outperforms previous methods by a significant margin. For 22% latency reduction, it surpasses previous methods by on average 7 percentage points on ImageNet-1k. It also works well under data-free or out-of-domain data settings. Our code is at https://github.com/DoctorKey/Practise
翻译:由于数据隐私问题,加速使用微小培训组的网络已成为实践中的一个关键需要。 以往的方法通过过滤水平的裁剪取得了有希望的成果。 在本文中, 我们从理论上研究这一问题, 并提出了与理论结果相匹配的有效算法。 首先, 我们提出细图共性假设, 解释为什么最近少发压缩算法没有过大的问题。 基于这个假设, 我们进一步建立了一个理论, 以第一次解释这些方法。 与天真地微微微调整一个纯净的网络相比, 模拟功能被证明能够降低参数的差异, 从而更容易优化。 根据我们的理论结论, 我们声称, 丢弃区是一个基本超强的几发压缩方案, 与我们的理论优化和加速率相匹配。 为了选择要丢弃的区块, 我们建议一个新的度、 可恢复性, 以有效衡量回收压缩网络的难度。 最后, 我们提议了一个名为 PRACTPISE 的算法, 仅使用微小的训练来加速网络。 PALCTISE 将先前的方法比重一个显著的差。 对于 22% 的 LADO- Dodocial- pain- main- main- pain- pain- pain- profrofrefree- proflesheforms- productions</s>