Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization capability in the pre-trained models. However, most of these methods employ manually crafted heuristics or expensive hyper-parameter searches, which prevent them from scaling up to large datasets and neural networks. To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. Specifically, TPGM maintains a set of projection radii, i.e., distance constraints between the fine-tuned model and the pre-trained model, for each layer, and enforces them through weight projections. To learn the constraints, we propose a bi-level optimization to automatically learn the best set of projection radii in an end-to-end manner. Theoretically, we show that the bi-level optimization formulation could explain the regularization capability of TPGM. Empirically, with little hyper-parameter search cost, TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance. For example, when fine-tuned on DomainNet-Real and ImageNet, compared to vanilla fine-tuning, TPGM shows $22\%$ and $10\%$ relative OOD improvement respectively on their sketch counterparts. Code is available at \url{https://github.com/PotatoTian/TPGM}.
翻译:近期关于迁移学习的研究表明,选择性微调子集层或为每个层自定义不同的学习率可大大提高预训练模型对域外数据的鲁棒性并保留泛化能力。但大多数这些方法采用手动制定的启发式方法或昂贵的超参数搜索,这导致它们不能扩展到大型数据集和神经网络。为解决这个问题,我们提出了可训练的投影梯度法(TPGM)来自动学习对于细粒度微调的约束。这受到将微调形式化为二级约束优化问题的启发。具体来说,TPGM维护每个层的投影半径,即微调模型与预训练模型之间的距离约束,并通过权重投影加以实施。为了学习这些约束,我们提出了二级优化方法,实现端到端地自动学习最佳的投影半径集。从理论上讲,我们展示了二级优化公式解释TPGM的正则化能力。在经验方面,TPGM仅需很少的超参数搜索成本,就可在域外性能上胜过现有微调方法,同时达到最佳分布内(ID)性能。例如,在DomainNet-Real和ImageNet上微调时,相对于普通的微调,TPGM对它们的手绘版本分别具有$22\%$和$10\%$的相对OOD性能提升。代码可在 \url{https://github.com/PotatoTian/TPGM} 找到。