对比音视觉蒙特卡洛自编码器 (Contrastive Audio-Visual Masked Autoencoder) - 专知论文

会员服务 ·

0

蒙特卡洛 · CAV · 掩码自编码MAE · 自编码器 · 监督预训练 ·

2023 年 4 月 11 日

Contrastive Audio-Visual Masked Autoencoder

翻译：对比音视觉蒙特卡洛自编码器

Yuan Gong,Andrew Rouditchenko,Alexander H. Liu,David Harwath,Leonid Karlinsky,Hilde Kuehne,James Glass

from arxiv, Accepted at ICLR 2023 as a notable top 25% paper. Code and pretrained models are at https://github.com/yuangongnd/cav-mae

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities. Subsequently, we propose the Contrastive Audio-Visual Masked Auto-Encoder (CAV-MAE) by combining contrastive learning and masked data modeling, two major self-supervised learning frameworks, to learn a joint and coordinated audio-visual representation. Our experiments show that the contrastive audio-visual correspondence learning objective not only enables the model to perform audio-visual retrieval tasks, but also helps the model learn a better joint representation. As a result, our fully self-supervised pretrained CAV-MAE achieves a new SOTA accuracy of 65.9% on VGGSound, and is comparable with the previous best supervised pretrained model on AudioSet in the audio-visual event classification task. Code and pretrained models are at https://github.com/yuangongnd/cav-mae.

翻译：本文首先将最近的蒙特卡洛(MAE)模型从单一模态扩展到音视觉多模态。随后，我们将对比学习和数据蒙版建模两种重要的自监督学习框架相结合，提出了对比音视觉蒙特卡洛自编码器(CAV-MAE)，以学习一个联合和协调的音视频表示。我们的实验表明，对比音视频对应关系学习的目标不仅使模型能够执行音视频检索任务，而且有助于模型学习更好的联合表示。因此，我们完全自监督预训练的CAV-MAE在VGGSound上实现了新的SOTA准确性，达到了65.9％，在音视频事件分类任务上与以前最佳的监督预训练模型相比，具有可比性。代码和预训练模型位于 https://github.com/yuangongnd/cav-mae.

0

相关内容

蒙特卡洛

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

自监督学习在CV进展？何恺明等最新ECCV2022教程《自监督表示学习在计算机视觉》，全面讲述自监督视觉学习进展

自监督学习在CV进展？何恺明等最新ECCV2022教程《自监督表示学习在计算机视觉》，全面讲述自监督视觉学习进展

专知会员服务

54+阅读 · 2022年12月10日

【何恺明组新论文】掩码自编码器作为时空学习器，Masked Autoencoders As Spatiotemporal Learners

【何恺明组新论文】掩码自编码器作为时空学习器，Masked Autoencoders As Spatiotemporal Learners

专知会员服务

39+阅读 · 2022年5月19日

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

专知会员服务

66+阅读 · 2021年1月10日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

80+阅读 · 2020年6月11日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

专知会员服务

16+阅读 · 2019年12月10日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

FAIR和牛津大学VGG组最新论文：多模态自监督学习

FAIR和牛津大学VGG组最新论文：多模态自监督学习

CVer

11+阅读 · 2020年3月29日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

阿托伐他汀激活自噬促进急性心肌梗死后间充质干细胞存活率和移植疗效的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

EGB抗氧化处理促进BMSCs归巢与定向分化胰岛样细胞治疗糖尿病的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于模型确认的高温环境复杂结构动力学建模方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

近似周期时间序列分析及其在程序化交易中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

调节性B细胞在牙龈间充质干细胞诱导胰岛移植耐受中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

间充质干细胞减轻心脏缺血再灌注损伤的一种旁分泌机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

肾缺血再灌注损伤的MRI磁敏感成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于加速度响应信号的时变系统小波状态空间参数识别研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于相干共振的环境振动与噪声能量获取

国家自然科学基金

0+阅读 · 2011年12月31日

用于强磁场的位置灵敏型探测器技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

VisorGPT: Learning Visual Prior via Generative Pre-Training

Arxiv

0+阅读 · 2023年5月30日

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Arxiv

0+阅读 · 2023年5月28日

Adapting Language-Audio Models as Few-Shot Audio Learners

Arxiv

0+阅读 · 2023年5月28日

Representation Transfer Learning via Multiple Pre-trained models for Linear Regression

Arxiv

0+阅读 · 2023年5月25日

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Arxiv

10+阅读 · 2022年7月30日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

掩码自编码MAE

监督预训练

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

自监督学习在CV进展？何恺明等最新ECCV2022教程《自监督表示学习在计算机视觉》，全面讲述自监督视觉学习进展

自监督学习在CV进展？何恺明等最新ECCV2022教程《自监督表示学习在计算机视觉》，全面讲述自监督视觉学习进展

专知会员服务

54+阅读 · 2022年12月10日

【何恺明组新论文】掩码自编码器作为时空学习器，Masked Autoencoders As Spatiotemporal Learners

【何恺明组新论文】掩码自编码器作为时空学习器，Masked Autoencoders As Spatiotemporal Learners

专知会员服务

39+阅读 · 2022年5月19日

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

专知会员服务

66+阅读 · 2021年1月10日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

80+阅读 · 2020年6月11日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

专知会员服务

16+阅读 · 2019年12月10日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

热门VIP内容

开通专知VIP会员享更多权益服务

《战略分析：面向国防与国际安全的建模与仿真》

《俄乌战争中影响力行动的社交媒体分析》2025最新69页

什么是模块化开放系统方法（MOSA）？从美陆军新型倾转旋翼机视角解读

《用于评估军事作战场景的仿真环境》

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

FAIR和牛津大学VGG组最新论文：多模态自监督学习

FAIR和牛津大学VGG组最新论文：多模态自监督学习

CVer

11+阅读 · 2020年3月29日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

相关论文

VisorGPT: Learning Visual Prior via Generative Pre-Training

Arxiv

0+阅读 · 2023年5月30日

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Arxiv

0+阅读 · 2023年5月28日

Adapting Language-Audio Models as Few-Shot Audio Learners

Arxiv

0+阅读 · 2023年5月28日

Representation Transfer Learning via Multiple Pre-trained models for Linear Regression

Arxiv

0+阅读 · 2023年5月25日

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Arxiv

10+阅读 · 2022年7月30日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

阿托伐他汀激活自噬促进急性心肌梗死后间充质干细胞存活率和移植疗效的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

EGB抗氧化处理促进BMSCs归巢与定向分化胰岛样细胞治疗糖尿病的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于模型确认的高温环境复杂结构动力学建模方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

近似周期时间序列分析及其在程序化交易中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

调节性B细胞在牙龈间充质干细胞诱导胰岛移植耐受中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

间充质干细胞减轻心脏缺血再灌注损伤的一种旁分泌机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

肾缺血再灌注损伤的MRI磁敏感成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于加速度响应信号的时变系统小波状态空间参数识别研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于相干共振的环境振动与噪声能量获取

国家自然科学基金

0+阅读 · 2011年12月31日

用于强磁场的位置灵敏型探测器技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员