Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning methods substantially improve accuracy on a given target distribution, they often reduce robustness to distribution shifts. We address this tension by introducing a simple and effective method for improving robustness while fine-tuning: ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT). Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution. On ImageNet and five derived distribution shifts, WiSE-FT improves accuracy under distribution shift by 4 to 6 percentage points (pp) over prior work while increasing ImageNet accuracy by 1.6 pp. WiSE-FT achieves similarly large robustness gains (2 to 23 pp) on a diverse set of six further distribution shifts, and accuracy gains of 0.8 to 3.3 pp compared to standard fine-tuning on seven commonly used transfer learning datasets. These improvements come at no additional computational cost during fine-tuning or inference.
翻译:CLIP或ALIGN等大型预先培训型号,在进行零点推断时,如CLIP或ALIGN等大型模型在一系列数据分布中提供一致的准确性(即,不对特定数据集进行微调),尽管现有的微调方法大大提高了特定目标分布的准确性,但往往会降低分布变化的稳健性。我们通过采用一种简单有效的方法,提高稳健性,同时微调:结合零点和微调模型的重量(WISE-FT)来解决这种紧张性。与标准的微调相比,WISE-FT在分布变化中提供了很大的精确性改进,同时保持了目标分布的高度准确性。在图像网和五个衍生的分布变化方面,WISE-FT使先前的分布变化的准确性提高了4至6个百分点(pp),同时将图像网络的准确性提高了1.6页。 WISE-FT在六种不同的进一步分布变化和微调中取得了类似的稳健性增量(2至23页)。与7种常用的转移学习数据集的标准微调整相比,这些改进没有额外的计算成本。