如何与SGD公司一道制定 " 微调愿景模型 " 。 (How to Fine-Tune Vision Models with SGD) - 专知论文

会员服务 ·

0

SGD · Performer · Less · Vision · MoDELS ·

2022 年 11 月 17 日

How to Fine-Tune Vision Models with SGD

翻译：如何与SGD公司一道制定 " 微调愿景模型 " 。

Ananya Kumar,Ruoqi Shen,Sébastien Bubeck,Suriya Gunasekar

SGD (with momentum) and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision. When the two methods perform the same, SGD is preferable because it uses less memory (12 bytes/parameter) than AdamW (16 bytes/parameter). However, on a suite of downstream tasks, especially those with distribution shifts, we show that fine-tuning with AdamW performs substantially better than SGD on modern Vision Transformer and ConvNeXt models. We find that large gaps in performance between SGD and AdamW occur when the fine-tuning gradients in the first "embedding" layer are much larger than in the rest of the model. Our analysis suggests an easy fix that works consistently across datasets and models: merely freezing the embedding layer (less than 1\% of the parameters) leads to SGD performing competitively with AdamW while using less memory. Our insights result in state-of-the-art accuracies on five popular distribution shift benchmarks: WILDS-FMoW, WILDS-Camelyon, Living-17, Waterbirds, and DomainNet.

翻译：SGD (具有动力) 和 AdamW 是计算机视觉中微调大型神经网络最常用的两种优化器。当两种方法都使用相同时, SGD 更可取,因为它比 AdamW (16 字节/ 参数) 使用较少的内存(12 字节/ 参数) 。但是, 在一组下游任务中, 特别是分布变换的下游任务中, 我们显示与 AdamW 的微调比 SGD 在现代视觉变换器和ConvNeXt 模型上的SGD表现要好得多。我们发现, SGD 和 AdamW 之间的性能差距很大, 当第一个“ 组合” 层的微调梯度比模型的其余部分大得多时。我们的分析表明, 一种在数据集和模型之间一致工作的简单方法: 仅仅冻结嵌入层( 低于 1 ⁇ 参数) 导致SGDDGD在使用记忆较少的情况下与 Adam 具有竞争力的 SGDDD 。我们的洞察到五个流行分布变换基准的状态: WILDS- FMOW、 WILDS- Camelyonbirls and Domaint and DomainalNet。

0

相关内容

SGD

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

窄线宽纳秒脉冲2μm波段光参量振荡放大技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于HPLC-TPMT EAD-MS联用技术的天芪降糖胶囊中TPMT酶亲和活性成分研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼GPR54基因在生殖中的功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

Tiam1基因在肝癌转移中的作用及其网络调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

The adaptive Levin method

Arxiv

0+阅读 · 2023年1月13日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

相关论文

The adaptive Levin method

Arxiv

0+阅读 · 2023年1月13日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

窄线宽纳秒脉冲2μm波段光参量振荡放大技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于HPLC-TPMT EAD-MS联用技术的天芪降糖胶囊中TPMT酶亲和活性成分研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼GPR54基因在生殖中的功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

Tiam1基因在肝癌转移中的作用及其网络调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员