VideoMAE: 蒙面自动校对者是自我监督视频培训前的高效数据学习者。 (VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training) - 专知论文

会员服务 ·

0

掩码 · 学习器 · 自编码器 · Performer · 数据集 ·

2022 年 10 月 18 日

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

翻译：VideoMAE: 蒙面自动校对者是自我监督视频培训前的高效数据学习者。

Zhan Tong,Yibing Song,Jue Wang,Limin Wang

from arxiv, NeurIPS 2022 camera-ready version

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets. In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP). We are inspired by the recent ImageMAE and propose customized video tube masking with an extremely high ratio. This simple design makes video reconstruction a more challenging self-supervision task, thus encouraging extracting more effective video representations during this pre-training process. We obtain three important findings on SSVP: (1) An extremely high proportion of masking ratio (i.e., 90% to 95%) still yields favorable performance of VideoMAE. The temporally redundant video content enables a higher masking ratio than that of images. (2) VideoMAE achieves impressive results on very small datasets (i.e., around 3k-4k videos) without using any extra data. (3) VideoMAE shows that data quality is more important than data quantity for SSVP. Domain shift between pre-training and target datasets is an important issue. Notably, our VideoMAE with the vanilla ViT can achieve 87.4% on Kinetics-400, 75.4% on Something-Something V2, 91.3% on UCF101, and 62.6% on HMDB51, without using any extra data. Code is available at https://github.com/MCG-NJU/VideoMAE.

翻译：在超大型数据集方面,通常需要预先培训视频变压器,才能在相对较小的数据集上取得优异性能。在本文中,我们显示,视频蒙面自动读数器(VideoMAE)是自我监督的视频预培训(SSVP)的数据效率学习者。我们受到最近的图像MAE的启发,并提议以极高的比例进行定制的视频管遮罩。这种简单设计使得视频重建更具挑战性,从而鼓励在这一培训前进程中提取更有效的视频演示。我们在SSVP.6上取得了三项重要发现:(1) 隐藏率(即,90%至95%)的比例极高,仍然具有可喜的视频MAE效果。时间冗余视频内容使得遮罩率高于图像。(2) 视频MA在非常小的数据集(即大约3k-4k视频)上取得了令人印象深刻的结果,而没有使用任何额外的数据。(3) 视频MAE在ViVIVP/62上的数据质量比我们的数据数量更重要。在VIVA4和目标数据设置上,在VHMAO-% 上,在VA-% CS-CMA上的任何数据上,在VDMA上实现一个重要的问题。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

表观遗传调控因子EZH2在神经内分泌前列腺癌发生中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

二氮嗪拮抗软骨氧化损伤的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

泛素化在棉花曲叶病毒侵染中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

内皮细胞功能关键指标的高通量筛查及其在ED早期预警中作用的研究

国家自然科学基金

0+阅读 · 2012年12月31日

AMPKα对脂肪酸β氧化的调控信号通路在糖尿病肾脏脂毒性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

I型成纤维细胞生长因子受体（FGFR1）在腭发育中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

积雪草基于TGF-β信号通路干预肾小管间质纤维化的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

RGC-32参与TGF-β#35825;导肾小管上皮向间充质细胞转化的分子调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

Good helper is around you: Attention-driven Masked Image Modeling

Arxiv

0+阅读 · 2022年11月28日

Contrastive Masked Autoencoders are Stronger Vision Learners

Arxiv

0+阅读 · 2022年11月28日

BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

Arxiv

0+阅读 · 2022年11月25日

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Arxiv

0+阅读 · 2022年11月25日

MixMask: Revisiting Masked Siamese Self-supervised Learning in Asymmetric Distance

Arxiv

0+阅读 · 2022年11月24日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Arxiv

13+阅读 · 2020年4月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】语义提示扩散变换器的像素级精确深度估计

俄乌冲突的地缘政治与军事教训（万字长文）

【博士论文】弥合多模态基础模型与世界模型之间的鸿沟

量子增强计算机视觉：超越经典算法

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Good helper is around you: Attention-driven Masked Image Modeling

Arxiv

0+阅读 · 2022年11月28日

Contrastive Masked Autoencoders are Stronger Vision Learners

Arxiv

0+阅读 · 2022年11月28日

BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

Arxiv

0+阅读 · 2022年11月25日

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Arxiv

0+阅读 · 2022年11月25日

MixMask: Revisiting Masked Siamese Self-supervised Learning in Asymmetric Distance

Arxiv

0+阅读 · 2022年11月24日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Arxiv

13+阅读 · 2020年4月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

表观遗传调控因子EZH2在神经内分泌前列腺癌发生中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

二氮嗪拮抗软骨氧化损伤的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

泛素化在棉花曲叶病毒侵染中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

内皮细胞功能关键指标的高通量筛查及其在ED早期预警中作用的研究

国家自然科学基金

0+阅读 · 2012年12月31日

AMPKα对脂肪酸β氧化的调控信号通路在糖尿病肾脏脂毒性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

I型成纤维细胞生长因子受体（FGFR1）在腭发育中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

积雪草基于TGF-β信号通路干预肾小管间质纤维化的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

RGC-32参与TGF-β#35825;导肾小管上皮向间充质细胞转化的分子调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员