视频CLIP: " 零弹视频-视频文本理解 " 的对抗性预培训 (VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding) - 专知论文

会员服务 ·

0

contrastive · 可理解性 · CASES · state-of-the-art · Performance ·

2021 年 9 月 28 日

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

翻译：视频CLIP: " 零弹视频-视频文本理解 " 的对抗性预培训

Hu Xu,Gargi Ghosh,Po-Yao Huang,Dmytro Okhonko,Armen Aghajanyan,Florian Metze Luke Zettlemoyer Christoph Feichtenhofer

from arxiv, EMNLP 2021

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks. VideoCLIP trains a transformer for video and text by contrasting temporally overlapping positive video-text pairs with hard negatives from nearest neighbor retrieval. Our experiments on a diverse series of downstream tasks, including sequence-level text-video retrieval, VideoQA, token-level action localization, and action segmentation reveal state-of-the-art performance, surpassing prior work, and in some cases even outperforming supervised approaches. Code is made available at https://github.com/pytorch/fairseq/examples/MMPT.

翻译：我们展示了视频CLIP,这是在培训前采用统一模式进行零光视频和文本理解的对比性做法,没有使用下游任务的任何标签。视频CLIP对视频和文本进行了变压器培训,将时间重叠的正对视频文本与近邻检索的硬负对进行了对比。我们在一系列不同的下游任务方面的实验,包括顺序级文字视频检索、视频QA、象征性行动定位和行动分解,显示了最新业绩,超过了以往的工作,在某些情况下甚至超过了监督的方法。代码可在https://github.com/pytorch/fairseq/examples/MMPT上查阅。

0

相关内容

contrastive

【CVPR2021】一种基于知识蒸馏的弱监督图像文本匹配模型

专知会员服务

35+阅读 · 2021年4月8日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【ACL2020】基于图神经网络的文本分类新方法

【ACL2020】基于图神经网络的文本分类新方法

专知会员服务

69+阅读 · 2020年7月12日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年7月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

CL4AC: A Contrastive Loss for Audio Captioning

Arxiv

0+阅读 · 2021年11月22日

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval

Arxiv

7+阅读 · 2021年8月18日

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Arxiv

6+阅读 · 2021年5月26日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Contrastive Embedding for Generalized Zero-Shot Learning

Arxiv

6+阅读 · 2021年3月30日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【CVPR2021】一种基于知识蒸馏的弱监督图像文本匹配模型

专知会员服务

35+阅读 · 2021年4月8日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【ACL2020】基于图神经网络的文本分类新方法

【ACL2020】基于图神经网络的文本分类新方法

专知会员服务

69+阅读 · 2020年7月12日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年7月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

CL4AC: A Contrastive Loss for Audio Captioning

Arxiv

0+阅读 · 2021年11月22日

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval

Arxiv

7+阅读 · 2021年8月18日

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Arxiv

6+阅读 · 2021年5月26日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Contrastive Embedding for Generalized Zero-Shot Learning

Arxiv

6+阅读 · 2021年3月30日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

微信扫码咨询专知VIP会员