Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating only a few parameters so as to improve storage efficiency, called parameter-efficient transfer learning (PETL). Current PETL methods have shown that by tuning only 0.5% of the parameters, ViT can be adapted to downstream tasks with even better performance than full fine-tuning. In this paper, we aim to further promote the efficiency of PETL to meet the extreme storage constraint in real-world applications. To this end, we propose a tensorization-decomposition framework to store the weight increments, in which the weights of each ViT are tensorized into a single 3D tensor, and their increments are then decomposed into lightweight factors. In the fine-tuning process, only the factors need to be updated and stored, termed Factor-Tuning (FacT). On VTAB-1K benchmark, our method performs on par with NOAH, the state-of-the-art PETL method, while being 5x more parameter-efficient. We also present a tiny version that only uses 8K (0.01% of ViT's parameters) trainable parameters but outperforms full fine-tuning and many other PETL methods such as VPT and BitFit. In few-shot settings, FacT also beats all PETL baselines using the fewest parameters, demonstrating its strong capability in the low-data regime.
翻译:最近的工作探索了调整预先培训的视觉变压器(VIT)的潜力,仅更新了几个参数,以提高储存效率,即所谓的参数效率转移学习(PETL)。目前的PETL方法显示,通过调试参数的0.5%,VIT可以适应下游任务,其性能甚至比完全微调更好。在本文件中,我们的目标是进一步提高PETL的效率,以满足现实应用中的极端储存限制。为此,我们提议了一个推力分分法框架,以存储增量重量,其中每个VIT的重量被拉成一个单三维拉多,其增量被分解成轻量因素。在微调过程中,只需要更新和储存一些因素,称为质调(FacT)。在VTAB-1K基准中,我们的方法与NOAH(最先进的低级PETL方法)比重,同时具有5x的参数效率。我们还展示了一个微版版本,其中仅使用8K(0.0%)的精度调标准,但比T的精度(ViT)其他精度(FIL)的精度标准。