雇用-MLP:通过等级重组实现的愿景 MLP (Hire-MLP: Vision MLP via Hierarchical Rearrangement) - 专知论文

会员服务 ·

0

Vision · INFORMS · Backbone · Extensibility · 模型评估 ·

2021 年 11 月 30 日

Hire-MLP: Vision MLP via Hierarchical Rearrangement

翻译：雇用-MLP:通过等级重组实现的愿景 MLP

Jianyuan Guo,Yehui Tang,Kai Han,Xinghao Chen,Han Wu,Chao Xu,Chang Xu,Yunhe Wang

Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information. Such approach withholds MLPs from getting comparable performance with their transformer-based counterparts and prevents them from becoming a general backbone for computer vision. This paper presents Hire-MLP, a simple yet competitive vision MLP architecture via \textbf{Hi}erarchical \textbf{re}arrangement, which contains two levels of rearrangements. Specifically, the inner-region rearrangement is proposed to capture local information inside a spatial region, and the cross-region rearrangement is proposed to enable information communication between different regions and capture global context by circularly shifting all tokens along spatial directions. Extensive experiments demonstrate the effectiveness of Hire-MLP as a versatile backbone for various vision tasks. In particular, Hire-MLP achieves competitive results on image classification, object detection and semantic segmentation tasks, e.g., 83.8% top-1 accuracy on ImageNet, 51.7% box AP and 44.8% mask AP on COCO val2017, and 49.9% mIoU on ADE20K, surpassing previous transformer-based and MLP-based models with better trade-off for accuracy and throughput. Code is available at https://github.com/ggjy/Hire-Wave-MLP.pytorch.

翻译：MLP- Mixer 和 ResMLP 等先前的视觉 MLP 等 MLP 和 ResMLP 等先前的视觉 MLP 将线性平板化图像补丁作为输入, 使得它们对于不同的输入大小和难以获取空间信息不具有灵活性。这种方法使 MLP 无法与基于变压器的对应方取得可比较的性能, 并阻止它们成为计算机视觉的一般主干线。本文展示了 Hire- MLP, 这是一个简单而具有竞争力的 MLP 结构, 包含两个级别的重新排列。具体地说, 提议内区域重新排列以在空间区域内捕捉本地信息, 而跨区域重新排列是为了让不同区域之间的信息交流,并通过在空间方向上循环移动所有符号来捕捉到全球背景。广泛实验展示了 Hire- MLP 作为各种视觉任务的多功能主干线。特别是, Hire- MLP 在图像分类、对象探测和语系分块任务上, 例如, 8.% 顶级- mab- mab- max- max- max- max- mill am- milling mill 的 milling mill 17, 在图像网络上, 在图像网络- sal- sal- sal- sal- box.

0

相关内容

Vision

【SIAM2021】机器学习最优传输，63页ppt教程

专知会员服务

46+阅读 · 2021年7月26日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【ICML2020】对比多视角表示学习

【ICML2020】对比多视角表示学习

专知会员服务

53+阅读 · 2020年6月28日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

专知会员服务

7+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Arxiv

0+阅读 · 2022年2月2日

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Arxiv

0+阅读 · 2022年2月2日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Arxiv

8+阅读 · 2021年5月5日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

Image-to-image Translation via Hierarchical Style Disentanglement

Arxiv

8+阅读 · 2021年3月2日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

VIP会员

文章信息

相关主题

相关VIP内容

【SIAM2021】机器学习最优传输，63页ppt教程

专知会员服务

46+阅读 · 2021年7月26日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【ICML2020】对比多视角表示学习

【ICML2020】对比多视角表示学习

专知会员服务

53+阅读 · 2020年6月28日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

专知会员服务

7+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

AI Agent、传统聊天机器人有何区别？如何评测？这篇30页综述讲明白了

【普林斯顿博士论文】迈向原则化的强化学习

基于多模态大模型的具身智能体研究进展与展望

CVPR2025 | ODE：多模态大语言模型幻觉的开集动态评估框架

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Arxiv

0+阅读 · 2022年2月2日

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Arxiv

0+阅读 · 2022年2月2日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Arxiv

8+阅读 · 2021年5月5日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

Image-to-image Translation via Hierarchical Style Disentanglement

Arxiv

8+阅读 · 2021年3月2日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

微信扫码咨询专知VIP会员