图像补丁是一波波: 阶段意识视野 MLP (An Image Patch is a Wave: Phase-Aware Vision MLP) - 专知论文

会员服务 ·

0

Vision · 词元分析器 · Extensibility · 深度前馈网络 · INFORMS ·

2021 年 11 月 25 日

An Image Patch is a Wave: Phase-Aware Vision MLP

翻译：图像补丁是一波波: 阶段意识视野 MLP

Yehui Tang,Kai Han,Jianyuan Guo,Chang Xu,Yanxi Li,Chao Xu,Yunhe Wang

Different from traditional convolutional neural network (CNN) and vision transformer, the multilayer perceptron (MLP) is a new kind of vision model with extremely simple architecture that only stacked by fully-connected layers. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation.

翻译：与传统的进化神经网络(CNN)和视觉变压器不同,多层光谱(MLP)是一种新型的视觉模型,其结构极其简单,只有完全相连的层层才能堆叠。 MLP 的输入图像通常被分割成多个符号( Patches),而现有的 MLP 模型则直接用固定的重量将它们组合在一起,忽略了不同图像的符号的不同语义信息。对于动态聚合的符号,我们提议将每个符号作为波函数代表,有两个部分,即振幅和阶段。振幅是最初的特征,而阶段术语则是根据输入图像的语义内容变化的复杂值。引入阶段术语可以动态调节MLP 的符号和固定重量之间的关系。基于波状象征性表示,我们为愿景任务建立了一个新型的波- MLP 结构。广泛的实验证明, 拟议的波- MLP 相对于图像分类、对象探测和语义断段等各种视觉任务中的最新的 MLP 结构而言, 。

0

相关内容

Vision

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：BERT原理和应用的图文教程

LibRec 精选：BERT原理和应用的图文教程

LibRec智能推荐

5+阅读 · 2018年12月22日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

用RNN构建文本生成器（TensorFlow Eager+ tf.keras）

用RNN构建文本生成器（TensorFlow Eager+ tf.keras）

专知

8+阅读 · 2018年11月2日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

(OpenCV/Keras)用手势控制的计算器

(OpenCV/Keras)用手势控制的计算器

机器学习研究会

3+阅读 · 2018年3月4日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

iSegFormer: Interactive Image Segmentation with Transformers

Arxiv

0+阅读 · 2022年1月26日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

Rethinking BiSeNet For Real-time Semantic Segmentation

Arxiv

7+阅读 · 2021年4月27日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Image Captioning through Image Transformer

Arxiv

3+阅读 · 2020年4月29日

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

Two-phase Hair Image Synthesis by Self-Enhancing Generative Model

Two-phase Hair Image Synthesis by Self-Enhancing Generative Model

Arxiv

3+阅读 · 2019年2月28日

Integrating semi-supervised label propagation and random forests for multi-atlas based hippocampus segmentation

Arxiv

3+阅读 · 2017年12月31日

VIP会员

文章信息

相关主题

词元分析器

深度前馈网络

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：BERT原理和应用的图文教程

LibRec 精选：BERT原理和应用的图文教程

LibRec智能推荐

5+阅读 · 2018年12月22日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

用RNN构建文本生成器（TensorFlow Eager+ tf.keras）

用RNN构建文本生成器（TensorFlow Eager+ tf.keras）

专知

8+阅读 · 2018年11月2日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

(OpenCV/Keras)用手势控制的计算器

(OpenCV/Keras)用手势控制的计算器

机器学习研究会

3+阅读 · 2018年3月4日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

iSegFormer: Interactive Image Segmentation with Transformers

Arxiv

0+阅读 · 2022年1月26日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

Rethinking BiSeNet For Real-time Semantic Segmentation

Arxiv

7+阅读 · 2021年4月27日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Image Captioning through Image Transformer

Arxiv

3+阅读 · 2020年4月29日

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

Two-phase Hair Image Synthesis by Self-Enhancing Generative Model

Two-phase Hair Image Synthesis by Self-Enhancing Generative Model

Arxiv

3+阅读 · 2019年2月28日

Integrating semi-supervised label propagation and random forests for multi-atlas based hippocampus segmentation

Arxiv

3+阅读 · 2017年12月31日

微信扫码咨询专知VIP会员