Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable convergence. To demonstrate the effectiveness of OTO, we train and compress full models simultaneously from scratch without fine-tuning for inference speedup and parameter reduction, and achieve state-of-the-art results on VGG16 for CIFAR10, ResNet50 for CIFAR10/ImageNet and Bert for SQuAD.
翻译:结构调整是将深神经网络(DNN)部署到资源受限制装置上的一种常用技术。 但是,现有的调整方法通常是超常的、特定的任务和需要额外的微调程序。 为了克服这些限制,我们提议了一个框架,将DNN压缩成具有竞争性性能的较薄结构,并且只通过光学技术(OTO)大量减少FLOP。 ATO包含两个关键:(一) 我们将DNN的参数分解成零变量组,使我们能够在不影响输出的情况下分化零组;以及(二) 推广零组,然后我们提出结构化的平衡优化问题,并提出新的优化算法,即半空间存储预测梯度(HSPG),以解决它,它超越了群体宽度探索的标准准方法,并保持类似的趋同。为了证明ODA、我们培训和压缩全模型的同时从刮痕中同时产生效力,而不对降幅速度和参数进行微调;(二) 推广零组,然后提出结构化优化优化优化优化问题,然后提出结构优化优化优化优化优化优化优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的优化的组合。