监听的蒙面自动编码器 (Masked Autoencoders that Listen) - 专知论文

会员服务 ·

0

掩码 · 自编码器 · 掩码自编码MAE · Extensibility · 解码 ·

2022 年 7 月 26 日

Masked Autoencoders that Listen

翻译：监听的蒙面自动编码器

Po-Yao Huang,Hu Xu,Juncheng Li,Alexei Baevski,Michael Auli,Wojciech Galuba,Florian Metze,Christoph Feichtenhofer

from arxiv, Technical report

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window attention in the decoder, as audio spectrograms are highly correlated in local time and frequency bands. We then fine-tune the encoder with a lower masking ratio on target datasets. Empirically, Audio-MAE sets new state-of-the-art performance on six audio and speech classification tasks, outperforming other recent models that use external supervised pre-training. The code and models will be at https://github.com/facebookresearch/AudioMAE.

翻译：本文研究一个基于图像的蒙面自动编码器(MAE)的简单扩展,从声音光谱中学习自我监督的演示。在MAE的变换器编码器编码解码器设计后, 我们的Audio-MAE首先编码了高遮罩率的音频光谱补丁, 仅通过编码器层喂养非面状。解码器随后重新排序并解码了以遮面符号添加的编码背景图解, 以重建输入光谱。我们认为将本地窗口的注意纳入解码器是有益的, 因为音频光谱仪在当地的时间和频率波段中高度相关。我们然后微调编码器, 在目标数据集中以较低的掩码率进行保护。 Empiral, 音频-MAE 设置了六种音频和语音分类任务的新状态表现, 超过最近使用外部监管前训练的其他模型。代码和模型将在 https://github.com/pacebookresearch/AudioMAE。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

lncRNA DATOC1影响microRNA成熟促进卵巢癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

以ED-A(+)Fn为靶点超声纳米分子成像及靶向治疗心脏移植慢性排斥反应

国家自然科学基金

0+阅读 · 2014年12月31日

基于Fermi-LAT和AMS-02的暗物质理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

基于压缩感知的点云数据压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

CLU,CR1，PICALM基因多态性及相关因素与内蒙古蒙、汉族阿尔茨海默病人群的病例-对照研究

国家自然科学基金

0+阅读 · 2012年12月31日

受体蛋白激酶调控植物花药早期细胞增殖和分化的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

神经胶质成熟因子在卵巢癌发生过程中的作用及其机制

国家自然科学基金

0+阅读 · 2009年12月31日

SNPs数据筛查的计算机理与参数计算方法

国家自然科学基金

0+阅读 · 2009年12月31日

Probabilistic Autoencoder

Arxiv

0+阅读 · 2022年9月19日

Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Arxiv

0+阅读 · 2022年9月16日

One-Shot Synthesis of Images and Segmentation Masks

Arxiv

0+阅读 · 2022年9月15日

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Arxiv

0+阅读 · 2022年9月15日

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

Arxiv

0+阅读 · 2022年9月15日

Composable Text Controls in Latent Space with ODEs

Arxiv

0+阅读 · 2022年9月15日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

掩码自编码MAE

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Probabilistic Autoencoder

Arxiv

0+阅读 · 2022年9月19日

Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Arxiv

0+阅读 · 2022年9月16日

One-Shot Synthesis of Images and Segmentation Masks

Arxiv

0+阅读 · 2022年9月15日

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Arxiv

0+阅读 · 2022年9月15日

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

Arxiv

0+阅读 · 2022年9月15日

Composable Text Controls in Latent Space with ODEs

Arxiv

0+阅读 · 2022年9月15日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

lncRNA DATOC1影响microRNA成熟促进卵巢癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

以ED-A(+)Fn为靶点超声纳米分子成像及靶向治疗心脏移植慢性排斥反应

国家自然科学基金

0+阅读 · 2014年12月31日

基于Fermi-LAT和AMS-02的暗物质理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

基于压缩感知的点云数据压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

CLU,CR1，PICALM基因多态性及相关因素与内蒙古蒙、汉族阿尔茨海默病人群的病例-对照研究

国家自然科学基金

0+阅读 · 2012年12月31日

受体蛋白激酶调控植物花药早期细胞增殖和分化的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

神经胶质成熟因子在卵巢癌发生过程中的作用及其机制

国家自然科学基金

0+阅读 · 2009年12月31日

SNPs数据筛查的计算机理与参数计算方法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员