具有地物融合和关键字对能力增强的关键字对关键字的训练前 (Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation) - 专知论文

会员服务 ·

0

Performer · MoDELS · contrastive · 情景 · Learning ·

2022 年 11 月 12 日

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

翻译：具有地物融合和关键字对能力增强的关键字对关键字的训练前

Yusong Wu,Ke Chen,Tianyu Zhang,Yuchen Hui,Taylor Berg-Kirkpatrick,Shlomo Dubnov

Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different data sources. Second, we construct a contrastive language-audio pretraining model by considering different audio encoders and text encoders. We incorporate the feature fusion mechanism and keyword-to-caption augmentation into the model design to further enable the model to process audio inputs of variable lengths and enhance the performance. Third, we perform comprehensive experiments to evaluate our model across three tasks: text-to-audio retrieval, zero-shot audio classification, and supervised audio classification. The results demonstrate that our model achieves superior performance in text-to-audio retrieval task. In audio classification tasks, the model achieves state-of-the-art performance in the zero-shot setting and is able to obtain performance comparable to models' results in the non-zero-shot setting. LAION-Audio-630K and the proposed model are both available to the public.

翻译：反向学习在多式联运代表制学习领域表现出了显著的成功。在本文中,我们建议通过一个对比式语言-语言前培训管道,通过将音频数据与自然语言描述结合起来,开发一个音频代表制。为了实现这一目标,我们首先发布大量来自不同数据来源的633,526对音文本的汇编LAION-Audio-630K。第二,我们通过考虑不同的音频编码器和文字编码器,构建了一个对比式语言-语言-语言前培训模式。我们将特征集成机制和关键词到功能增强纳入模型设计,以进一步使模型能够处理不同长度的音频输入并增强性能。第三,我们进行全面实验,以评价我们的模型跨越三个任务:文本到音频检索、零发音频分类和监督音频分类。结果显示,我们的模型在文本到音频检索任务中取得了优异的性能。在音频分类任务中,模型在零镜头设置中达到最新性能性能,并且能够取得与模型在非零镜头设置中与模型结果相近的性能。

0

相关内容

Performer

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

靶向抑制 MNK-eIF4E 轴增效TRAIL治疗鼻咽癌的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型植物化合物马齿苋脑苷A调节lin28/miR-let7通路预防慢性肝炎恶性转化的研究

国家自然科学基金

0+阅读 · 2013年12月31日

膀胱癌患者尿液exosome中潜在肿瘤标志物研究

国家自然科学基金

0+阅读 · 2012年12月31日

PSCA适配子-tBID-Fe3O4双功能MR分子探针对前列腺癌及其骨转移靶向诊疗作用的研究

国家自然科学基金

0+阅读 · 2012年12月31日

PSCA核酸适体靶向纳米探针可视化前列腺癌转移灶的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基因融合疫苗抑制肝移植后乙肝复发

国家自然科学基金

0+阅读 · 2011年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型人源抗TRAIL-R1单克隆抗体增强TRAIL诱导肿瘤细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

肿瘤转移靶向肽TMTP1对微转移灶特异性标记及靶向清除研究

国家自然科学基金

0+阅读 · 2009年12月31日

Contrastive Trajectory Similarity Learning with Dual-Feature Attention

Arxiv

0+阅读 · 2023年1月9日

Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation

Arxiv

0+阅读 · 2023年1月8日

TarViS: A Unified Approach for Target-based Video Segmentation

Arxiv

0+阅读 · 2023年1月6日

Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets

Arxiv

0+阅读 · 2023年1月6日

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

Arxiv

0+阅读 · 2023年1月6日

Deep Latent Variable Models for Semi-supervised Paraphrase Generation

Arxiv

0+阅读 · 2023年1月5日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

相关论文

Contrastive Trajectory Similarity Learning with Dual-Feature Attention

Arxiv

0+阅读 · 2023年1月9日

Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation

Arxiv

0+阅读 · 2023年1月8日

TarViS: A Unified Approach for Target-based Video Segmentation

Arxiv

0+阅读 · 2023年1月6日

Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets

Arxiv

0+阅读 · 2023年1月6日

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

Arxiv

0+阅读 · 2023年1月6日

Deep Latent Variable Models for Semi-supervised Paraphrase Generation

Arxiv

0+阅读 · 2023年1月5日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

靶向抑制 MNK-eIF4E 轴增效TRAIL治疗鼻咽癌的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型植物化合物马齿苋脑苷A调节lin28/miR-let7通路预防慢性肝炎恶性转化的研究

国家自然科学基金

0+阅读 · 2013年12月31日

膀胱癌患者尿液exosome中潜在肿瘤标志物研究

国家自然科学基金

0+阅读 · 2012年12月31日

PSCA适配子-tBID-Fe3O4双功能MR分子探针对前列腺癌及其骨转移靶向诊疗作用的研究

国家自然科学基金

0+阅读 · 2012年12月31日

PSCA核酸适体靶向纳米探针可视化前列腺癌转移灶的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基因融合疫苗抑制肝移植后乙肝复发

国家自然科学基金

0+阅读 · 2011年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型人源抗TRAIL-R1单克隆抗体增强TRAIL诱导肿瘤细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

肿瘤转移靶向肽TMTP1对微转移灶特异性标记及靶向清除研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员