CLAP: 从自然语言监督中学习听觉概念 (CLAP: Learning Audio Concepts From Natural Language Supervision) - 专知论文

会员服务 ·

0

Learning · contrastive · 监督 · SOTA · 类别 ·

2022 年 6 月 9 日

CLAP: Learning Audio Concepts From Natural Language Supervision

翻译：CLAP: 从自然语言监督中学习听觉概念

Benjamin Elizalde,Soham Deshmukh,Mahmoud Al Ismail,Huaming Wang

Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many recordings focusing on one task. Learning under such restricted supervision limits the flexibility of models because they require labeled audio for training and can only predict the predefined categories. Instead, we propose to learn audio concepts from natural language supervision. We call our approach Contrastive Language-Audio Pretraining (CLAP), which learns to connect language and audio by using two encoders and a contrastive learning to bring audio and text descriptions into a joint multimodal space. We trained CLAP with 128k audio and text pairs and evaluated it on 16 downstream tasks across 8 domains, such as Sound Event Classification, Music tasks, and Speech-related tasks. Although CLAP was trained with significantly less pairs than similar computer vision models, it establishes SoTA for Zero-Shot performance. Additionally, we evaluated CLAP in a supervised learning setup and achieve SoTA in 5 tasks. Hence, CLAP's Zero-Shot capability removes the need of training with class labels, enables flexible class prediction at inference time, and generalizes to multiple downstream tasks.

翻译：主流音频分析模型通过一个类标签模式培训,学习许多侧重于一项任务的记录。在这种限制性监督下学习限制了模型的灵活性,因为它们需要标签的音频用于培训,只能预测预定的类别。相反,我们提议从自然语言监督中学习音频概念。我们称我们的方法“语言-语言交流前培训”,它学会通过使用两个编码器和对比学习将语言和音频连接到一个联合多式联运空间。因此,我们用128k音频和文本对CLAP进行了培训,并评估了8个领域16个下游任务,如“健康事件分类”、“音乐任务”和“语言演讲”相关任务。虽然CLAP的培训比类似的计算机视觉模型少得多,但它为零空间表现建立了SoTA。此外,我们用监督的学习设置和对比性学习来评价CLAP,在5项任务中实现了SoTA。因此,CLAP的“零热”能力消除了对课堂标签的培训需求,从而使得在推论时间进行灵活的课堂预测,并一般地向下游任务。

0

相关内容

Learning

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

短周期氧化物超晶格中的电荷转移及其应变调控

国家自然科学基金

0+阅读 · 2015年12月31日

潮流能发电装备叶片之间非线性水动力响应研究

国家自然科学基金

0+阅读 · 2014年12月31日

零带隙附近HgCdTe半导体二维电子气的自旋轨道耦合特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

单个中性原子的操控与精密测量

国家自然科学基金

0+阅读 · 2013年12月31日

提取重力固体潮信号中地球物理信息和地震前兆信息的关键信号处理算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cldn-7调控Intergrin/FAK信号通路参与大肠癌发生、侵袭转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高温高压极端条件下H2O-CO2流体性质及其对橄榄岩体系高压物性的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

人造规范势中冷原子的新奇量子态及动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

钙钛矿氧化物薄膜异质界面的奇异磁性和磁输运

国家自然科学基金

0+阅读 · 2011年12月31日

Intermedin-53在心肌肥厚中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

Arxiv

0+阅读 · 2022年7月26日

PT4AL: Using Self-Supervised Pretext Tasks for Active Learning

Arxiv

0+阅读 · 2022年7月26日

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年7月26日

Heterogeneous Contrastive Learning

Arxiv

0+阅读 · 2022年7月26日

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Arxiv

0+阅读 · 2022年7月24日

CoLES: Contrastive Learning for Event Sequences with Self-Supervision

Arxiv

0+阅读 · 2022年7月22日

Learning Unsupervised Hierarchies of Audio Concepts

Arxiv

0+阅读 · 2022年7月21日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

Arxiv

0+阅读 · 2022年7月26日

PT4AL: Using Self-Supervised Pretext Tasks for Active Learning

Arxiv

0+阅读 · 2022年7月26日

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年7月26日

Heterogeneous Contrastive Learning

Arxiv

0+阅读 · 2022年7月26日

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Arxiv

0+阅读 · 2022年7月24日

CoLES: Contrastive Learning for Event Sequences with Self-Supervision

Arxiv

0+阅读 · 2022年7月22日

Learning Unsupervised Hierarchies of Audio Concepts

Arxiv

0+阅读 · 2022年7月21日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

短周期氧化物超晶格中的电荷转移及其应变调控

国家自然科学基金

0+阅读 · 2015年12月31日

潮流能发电装备叶片之间非线性水动力响应研究

国家自然科学基金

0+阅读 · 2014年12月31日

零带隙附近HgCdTe半导体二维电子气的自旋轨道耦合特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

单个中性原子的操控与精密测量

国家自然科学基金

0+阅读 · 2013年12月31日

提取重力固体潮信号中地球物理信息和地震前兆信息的关键信号处理算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cldn-7调控Intergrin/FAK信号通路参与大肠癌发生、侵袭转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高温高压极端条件下H2O-CO2流体性质及其对橄榄岩体系高压物性的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

人造规范势中冷原子的新奇量子态及动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

钙钛矿氧化物薄膜异质界面的奇异磁性和磁输运

国家自然科学基金

0+阅读 · 2011年12月31日

Intermedin-53在心肌肥厚中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员