Vision transformer has achieved competitive performance on a variety of computer vision applications. However, their storage, run-time memory, and computational demands are hindering the deployment to mobile devices. Here we present a vision transformer pruning approach, which identifies the impacts of dimensions in each layer of transformer and then executes pruning accordingly. By encouraging dimension-wise sparsity in the transformer, important dimensions automatically emerge. A great number of dimensions with small importance scores can be discarded to achieve a high pruning ratio without significantly compromising accuracy. The pipeline for vision transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning dimensions of linear projections; 3) fine-tuning. The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate the effectiveness of our proposed method.
翻译:视觉变压器在各种计算机视觉应用中取得了竞争性的性能,然而,它们的存储、运行时间记忆和计算需求正在阻碍移动设备的部署。在这里,我们展示了视觉变压器裁剪方法,确定每个变压器层的维度影响,然后相应执行裁剪。通过鼓励变压器的维度宽度,重要的维度会自动显现出来。许多重要性小的维度可以被丢弃,以达到高裁剪率,而不会大大降低准确性。视觉变压器的管道如下:1) 与宽度规范化有关的培训;2) 线性预测的运行维度;3) 微调。拟议算法的降低参数和FLOP比率在图像网络数据集上得到了很好的评估和分析,以显示我们拟议方法的有效性。