通过培训前愿景和语言模式的多式开放语言视频视频分类 (Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models) - 专知论文

会员服务 ·

0

视频分类 · 语言模型化 · 多峰值 · Vision · MoDELS ·

2022 年 7 月 15 日

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

翻译：通过培训前愿景和语言模式的多式开放语言视频视频分类

Rui Qian,Yeqing Li,Zheng Xu,Ming-Hsuan Yang,Serge Belongie,Yin Cui

Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition. In this work, we extend this paradigm by leveraging motion and audio that naturally exist in video. We present \textbf{MOV}, a simple yet effective method for \textbf{M}ultimodal \textbf{O}pen-\textbf{V}ocabulary video classification. In MOV, we directly use the vision encoder from pre-trained VLMs with minimal modifications to encode video, optical flow and audio spectrogram. We design a cross-modal fusion mechanism to aggregate complimentary multimodal information. Experiments on Kinetics-700 and VGGSound show that introducing flow or audio modality brings large performance gains over the pre-trained VLM and existing methods. Specifically, MOV greatly improves the accuracy on base classes, while generalizes better on novel classes. MOV achieves state-of-the-art results on UCF and HMDB zero-shot video classification benchmarks, significantly outperforming both traditional zero-shot methods and recent methods based on VLMs. Code and models will be released.

翻译：使用在大型图像- 文本配对上经过预先培训的视觉和语言模型( VLMs) 正在成为开放词汇视频识别的一个很有希望的范例。在这项工作中,我们通过利用视频中自然存在的运动和音频来扩展这一范例。我们展示了 kindetics- 700 和 VGSound 实验显示,引入流或音频模式会给经过培训的VLM 和现有方法带来巨大的性能收益。具体地说, MOV 大大提高了基础班的准确性,同时在新型班级上作了更好的概括化。 MOV 实现了对UCF和HMDB 的状态艺术结果, 并尽可能对编码视频、光学流和音频光谱进行了修改。我们设计了一个跨模式融合机制,以汇总补充性多式信息。关于Kindicatics- 700 和 VGGSound的实验显示, 引入流或音频模式会大大提高VL 和现有方法的绩效。具体地说, MOD- 将基于最近零光谱和HD- DVshal- 的Vshal- CD- sal- 和Vsh- sal- sal- sal- slation- slation- sal- slational- sal- slationaldal- slational- slations bal- slationaldaldalbaldaldalbaldaldaldaldaldaldaldaldaldgaldaldaldaldaldaldaldaldalddgyd 和Vs 和Vs- sald- saldrogleg) 将显著方法大大超越了V 和VM- 和VD- sald- sald- 和VD- 和VD- 和VD- saldgaldgaldaldal- sal- sal- sal- sald- sal- sald- sal- sal- sald- sal- sal- sal- sald- sald- sal- sald- sald- sal- sal- sal-

0

相关内容

视频分类

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

副溶血弧菌VI型分泌系统的表型功能及基因调控研究

国家自然科学基金

1+阅读 · 2014年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

缺血性脑损伤介导的ErbB4胞内结构域分解的分子机制及作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

必需调控系统PhoPQ快速起源及与细菌有丝分裂的调控关系

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

多铁性材料的低温磁性、磁相变与热输运特性的研究

国家自然科学基金

0+阅读 · 2011年12月31日

肌萎缩侧索硬化脑和脊髓内源性神经前体细胞增殖、分化、定向迁移及其调控因子HoxB家族表达改变的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

Learning Audio-Visual embedding for Wild Person Verification

Arxiv

0+阅读 · 2022年9月9日

Plant Species Classification Using Transfer Learning by Pretrained Classifier VGG-19

Arxiv

0+阅读 · 2022年9月7日

Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

Arxiv

0+阅读 · 2022年9月7日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

BERT for Joint Intent Classification and Slot Filling

Arxiv

13+阅读 · 2019年2月28日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Learning Audio-Visual embedding for Wild Person Verification

Arxiv

0+阅读 · 2022年9月9日

Plant Species Classification Using Transfer Learning by Pretrained Classifier VGG-19

Arxiv

0+阅读 · 2022年9月7日

Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

Arxiv

0+阅读 · 2022年9月7日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

BERT for Joint Intent Classification and Slot Filling

Arxiv

13+阅读 · 2019年2月28日

相关基金

副溶血弧菌VI型分泌系统的表型功能及基因调控研究

国家自然科学基金

1+阅读 · 2014年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

缺血性脑损伤介导的ErbB4胞内结构域分解的分子机制及作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

必需调控系统PhoPQ快速起源及与细菌有丝分裂的调控关系

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

多铁性材料的低温磁性、磁相变与热输运特性的研究

国家自然科学基金

0+阅读 · 2011年12月31日

肌萎缩侧索硬化脑和脊髓内源性神经前体细胞增殖、分化、定向迁移及其调控因子HoxB家族表达改变的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员