迫击炮MLP:一个高效的MLP-类似空间-时代代表制学习的后骨 (MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning) - 专知论文

会员服务 ·

0

Learning · Backbone · 可约的 · 层 · 块 ·

2022 年 8 月 23 日

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

翻译：迫击炮MLP:一个高效的MLP-类似空间-时代代表制学习的后骨

David Junhao Zhang,Kunchang Li,Yali Wang,Yunpeng Chen,Shashwat Chandra,Yu Qiao,Luoqi Liu,Mike Zheng Shou

from arxiv, ECCV2022

Recently, MLP-Like networks have been revived for image recognition. However, whether it is possible to build a generic MLP-Like architecture on video domain has not been explored, due to complex spatial-temporal modeling with large computation burden. To fill this gap, we present an efficient self-attention free backbone, namely MorphMLP, which flexibly leverages the concise Fully-Connected (FC) layer for video representation learning. Specifically, a MorphMLP block consists of two key layers in sequence, i.e., MorphFC_s and MorphFC_t, for spatial and temporal modeling respectively. MorphFC_s can effectively capture core semantics in each frame, by progressive token interaction along both height and width dimensions. Alternatively, MorphFC_t can adaptively learn long-term dependency over frames, by temporal token aggregation on each spatial location. With such multi-dimension and multi-scale factorization, our MorphMLP block can achieve a great accuracy-computation balance. Finally, we evaluate our MorphMLP on a number of popular video benchmarks. Compared with the recent state-of-the-art models, MorphMLP significantly reduces computation but with better accuracy, e.g., MorphMLP-S only uses 50% GFLOPs of VideoSwin-T but achieves 0.9% top-1 improvement on Kinetics400, under ImageNet1K pretraining. MorphMLP-B only uses 43% GFLOPs of MViT-B but achieves 2.4% top-1 improvement on SSV2, even though MorphMLP-B is pretrained on ImageNet1K while MViT-B is pretrained on Kinetics400. Moreover, our method adapted to the image domain outperforms previous SOTA MLP-Like architectures. Code is available at https://github.com/MTLab/MorphMLP.

翻译：最近, MLP- 类似网络已经恢复了图像识别。但是, 尚未探索是否有可能在视频域内建立通用 MLP- 类似架构。由于复杂的空间时标模型, 且计算负担巨大, 是否有可能在视频域内建立通用 MLP- 类似架构。为了填补这一空白, 我们展示了一个高效的自控自由主干, 即 MorphMLP, 它可以灵活地利用简明的全结( FC) 层来进行视频演示学习。具体地说, 一个 MorphMLP 块由两个关键层组成, 即: MorphFFC_ 和 morphFC_ t, 分别用于空间和时间建模。 MorphFCFC_ 能够通过在高度和宽度两个维度的递进化符号互动, 以渐进的方式学习框架上的长期依赖性。我们的MLFMM1- PlickM1 的MMMLMT 只能在高级视频基准上进行大幅的MphMLP- 改进。

0

相关内容

Learning

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

专知会员服务

62+阅读 · 2021年7月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

肺炎支原体外排泵ABC Transporter在大环内酯类耐药中的作用机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

并联机构运动学标定的参数可辨识特性研究

国家自然科学基金

0+阅读 · 2015年12月31日

输入/输出时滞Markov跳变随机系统的最优控制与滤波

国家自然科学基金

0+阅读 · 2014年12月31日

hedgehog-Gli信号通路在骨髓基质细胞对急性髓系白血病细胞凋亡影响中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PDCD5对多发性骨髓瘤survivin表达的影响及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

结构双流体模型

国家自然科学基金

0+阅读 · 2012年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

PARP-1介导的AIF通路在链霉素致内耳毛细胞凋亡中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRPC和ORAI1协同构成钙池操纵的钙通道(SOC)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

高层建筑拱式转换层结构研究

国家自然科学基金

0+阅读 · 2008年12月31日

Compressed Vision for Efficient Video Understanding

Arxiv

0+阅读 · 2022年10月6日

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Neural Residual Flow Fields for Efficient Video Representations

Arxiv

0+阅读 · 2022年10月5日

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

Arxiv

0+阅读 · 2022年10月4日

Enhancing Spatiotemporal Prediction Model using Modular Design and Beyond

Arxiv

0+阅读 · 2022年10月4日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

专知会员服务

62+阅读 · 2021年7月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

相关论文

Compressed Vision for Efficient Video Understanding

Arxiv

0+阅读 · 2022年10月6日

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Neural Residual Flow Fields for Efficient Video Representations

Arxiv

0+阅读 · 2022年10月5日

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

Arxiv

0+阅读 · 2022年10月4日

Enhancing Spatiotemporal Prediction Model using Modular Design and Beyond

Arxiv

0+阅读 · 2022年10月4日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

肺炎支原体外排泵ABC Transporter在大环内酯类耐药中的作用机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

并联机构运动学标定的参数可辨识特性研究

国家自然科学基金

0+阅读 · 2015年12月31日

输入/输出时滞Markov跳变随机系统的最优控制与滤波

国家自然科学基金

0+阅读 · 2014年12月31日

hedgehog-Gli信号通路在骨髓基质细胞对急性髓系白血病细胞凋亡影响中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PDCD5对多发性骨髓瘤survivin表达的影响及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

结构双流体模型

国家自然科学基金

0+阅读 · 2012年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

PARP-1介导的AIF通路在链霉素致内耳毛细胞凋亡中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRPC和ORAI1协同构成钙池操纵的钙通道(SOC)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

高层建筑拱式转换层结构研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员