The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA's main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to significantly better sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2% of the weights retained, our approach improves the accuracy by 63% relative to the state of the art. Furthermore, when used in conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over the state-of-the-art approaches.
翻译:现代神经网络的庞大规模使现代神经网络的庞大规模形成模型,成为严重计算挑战的模型。 流行的压缩技术类别通过调整或压缩预先培训网络的重量,克服了这一挑战。 这些技术虽然有用,但往往在计算要求和压缩质量之间面临严重权衡。 在这项工作中,我们提议了一个基于优化的新框架,其中考虑到受气候制约的裁剪(和更新)多重重量的综合效应。 我们的方法,即CHITA,扩展了经典最佳智能脑外生框架,并大大改进了现有优化网络运行方法的速度、记忆和性能。 CHITA的主要工作马对损失函数的记忆友好化缩略图和缩微缩缩缩微缩缩缩缩缩缩缩缩缩缩图进行了组合式更新。 在预先培训模型和数据集的标准基准中,CHITA导致比竞争方法的偏差- 准确性权衡。 例如,对于只保留了2%的MLPNet, 我们的方法提高了63%的精确度, 相对于我们所用方法的精确度, 当SGDM方法使用时, 的精确度也提高了。</s>