Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable convergence. To demonstrate the effectiveness of OTO, we train and compress full models simultaneously from scratch without fine-tuning for inference speedup and parameter reduction, and achieve state-of-the-art results on VGG16 for CIFAR10, ResNet50 for CIFAR10 and Bert for SQuAD and competitive result on ResNet50 for ImageNet. The source code is available at https://github.com/tianyic/only_train_once.
翻译:为克服这些限制,我们提议了一个框架,将DNNS压缩成具有竞争性性能的较薄结构结构,并通过仅限技术操作大量减少FLOPs。OTO包含两个关键内容:(一) 我们将DNNS的参数分解成零变量组,使我们能够在不影响输出的情况下使用零组;以及(二) 推广零组,然后我们制定结构化平衡优化问题,并提出新的优化算法、半空间图集预测梯度(HSPG),以解决它,它比标准标准准方法更符合群体宽度的探索,并保持类似的趋同。为了证明OTO、我们培训和压缩全模型的效益,同时从抓抓中提取,无需微调降速和参数;以及(二) 推广零组,然后我们提出结构化的平衡优化问题,然后提出新的优化算法,半空间图集预测梯度(HSPG),用以解决这个问题。