VLM: 视频理解前培训模式 (VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding) - 专知论文

会员服务 ·

0

单峰值 · 可理解性 · Performer · Better · MoDELS ·

2021 年 5 月 20 日

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding

翻译：VLM: 视频理解前培训模式

Hu Xu,Gargi Ghosh,Po-Yao Huang,Prahal Arora,Masoumeh Aminzadeh,Christoph Feichtenhofer,Florian Metze,Luke Zettlemoyer

from arxiv, 9 pages, ACL Findings 2021

We present a simplified, task-agnostic multi-modal pre-training approach that can accept either video or text input, or both for a variety of end tasks. Existing pre-training are task-specific by adopting either a single cross-modal encoder that requires both modalities, limiting their use for retrieval-style end tasks or more complex multitask learning with two unimodal encoders, limiting early cross-modal fusion. We instead introduce new pretraining masking schemes that better mix across modalities (e.g. by forcing masks for text to predict the closest video embeddings) while also maintaining separability (e.g. unimodal predictions are sometimes required, without using all the input). Experimental results show strong performance across a wider range of tasks than any previous methods, often outperforming task-specific pre-training.

翻译：我们提出了一个简化的、任务不可知的多模式培训前做法,它可以接受视频或文字输入,或者同时接受各种最终任务。现有的培训前做法是任务性强的,要么采用单一的跨模式编码器,需要两种模式,限制它们用于检索式的最终任务,要么使用两个单一模式编码器进行更为复杂的多任务学习,限制早期的跨模式融合。我们采用新的培训前掩蔽方案,更好地将各种模式混合起来(例如,强迫文字遮罩预测最接近的视频嵌入),同时保持分离性(例如,有时需要单式预测,而无需使用所有投入 ) 。实验结果显示,在比以往任何方法都更为广泛的一系列任务中,往往比具体任务前培训都出色。

0

相关内容

单峰值

【ACM MM2020-计算所】基于多源语义嵌入的场景识别的广义零样本学习

【ACM MM2020-计算所】基于多源语义嵌入的场景识别的广义零样本学习

专知会员服务

22+阅读 · 2020年9月5日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

专知会员服务

24+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

专知会员服务

87+阅读 · 2020年3月1日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

AI前线

12+阅读 · 2019年7月22日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Arxiv

0+阅读 · 2021年7月13日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年7月10日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

VIP会员

文章信息

相关主题

相关VIP内容

【ACM MM2020-计算所】基于多源语义嵌入的场景识别的广义零样本学习

【ACM MM2020-计算所】基于多源语义嵌入的场景识别的广义零样本学习

专知会员服务

22+阅读 · 2020年9月5日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

专知会员服务

24+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

专知会员服务

87+阅读 · 2020年3月1日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

AI前线

12+阅读 · 2019年7月22日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Arxiv

0+阅读 · 2021年7月13日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年7月10日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

微信扫码咨询专知VIP会员