CotNet: 婚姻演变和关注所有数据大小 (CoAtNet: Marrying Convolution and Attention for All Data Sizes) - 专知论文

会员服务 ·

0

卷积 · 泛化理论 · state-of-the-art · 注意力机制 · 模型评估 ·

2021 年 6 月 9 日

CoAtNet: Marrying Convolution and Attention for All Data Sizes

翻译：CotNet: 婚姻演变和关注所有数据大小

Zihang Dai,Hanxiao Liu,Quoc V. Le,Mingxing Tan

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks. In this work, we show that while Transformers tend to have larger model capacity, their generalization can be worse than convolutional networks due to the lack of the right inductive bias. To effectively combine the strengths from both architectures, we present CoAtNets(pronounced "coat" nets), a family of hybrid models built from two key insights:(1) depthwise Convolution and self-Attention can be naturally unified via simple relative attention; (2) vertically stacking convolution layers and attention layers in a principled way is surprisingly effective in improving generalization, capacity and efficiency. Experiments show that our CoAtNets achieve state-of-the-art performance under different resource constraints across various datasets. For example, CoAtNet achieves 86.0% ImageNet top-1 accuracy without extra data, and 89.77% with extra JFT data, outperforming prior arts of both convolutional networks and Transformers. Notably, when pre-trained with 13M images fromImageNet-21K, our CoAtNet achieves 88.56% top-1 accuracy, matching ViT-huge pre-trained with 300M images from JFT while using 23x less data.

翻译：变异器吸引了计算机视觉中越来越多的兴趣,但是它们仍然落后于最先进的变异网络。在这项工作中,我们表明,虽然变异器往往具有较大的模型能力,但由于缺乏正确的感官偏差,其一般化可能比变异网络更糟。为了有效地结合这两个结构的优势,我们展示了CoATNets(推出的“coat”网),这组混合模型来自两个关键见解:(1) 深度变异和自我保持可以通过简单的相对关注而自然地统一;(2) 垂直堆叠堆叠层和关注层以原则性的方式在改进一般化、能力和效率方面出乎意料的效果。实验显示,我们的CoAtNet在各种数据集的不同资源制约下,取得了最先进的业绩。例如,CoAtNet在没有额外数据的情况下实现了86.0%的图像网络头1的精确度,89.77%的混合模型具有额外的JFT数据,这两个变异网络和变器的先前艺术表现优异器。值得注意的是,当预先用13M图像对IMage-21K的13M图像进行了培训时,同时利用23-FFFFF3-21K的高级数据进行最低的精确化数据。

0

相关内容

在数学（特别是功能分析）中，卷积是对两个函数（f和g）的数学运算，产生三个函数，表示第一个函数的形状如何被另一个函数修改。卷积一词既指结果函数，又指计算结果的过程。它定义为两个函数的乘积在一个函数反转和移位后的积分。并针对所有shift值评估积分，从而生成卷积函数。

5篇TPAMI-2021《图神经网络》论文快读！

专知会员服务

52+阅读 · 2021年5月30日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

专知会员服务

29+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

5+阅读 · 2018年10月16日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

Vision Xformers: Efficient Attention for Image Classification

Vision Xformers: Efficient Attention for Image Classification

Arxiv

0+阅读 · 2021年8月3日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

Beyond Low-frequency Information in Graph Convolutional Networks

Arxiv

14+阅读 · 2021年1月4日

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

Revisiting Graph Neural Networks: All We Have is Low-Pass Filters

Arxiv

7+阅读 · 2019年5月23日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Graph Attention Networks

Arxiv

10+阅读 · 2018年2月4日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Arxiv

4+阅读 · 2017年11月27日

VIP会员

文章信息

相关主题

state-of-the-art

注意力机制

相关VIP内容

5篇TPAMI-2021《图神经网络》论文快读！

专知会员服务

52+阅读 · 2021年5月30日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

专知会员服务

29+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

5+阅读 · 2018年10月16日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

相关论文

Vision Xformers: Efficient Attention for Image Classification

Vision Xformers: Efficient Attention for Image Classification

Arxiv

0+阅读 · 2021年8月3日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

Beyond Low-frequency Information in Graph Convolutional Networks

Arxiv

14+阅读 · 2021年1月4日

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

Revisiting Graph Neural Networks: All We Have is Low-Pass Filters

Arxiv

7+阅读 · 2019年5月23日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Graph Attention Networks

Arxiv

10+阅读 · 2018年2月4日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Arxiv

4+阅读 · 2017年11月27日

微信扫码咨询专知VIP会员