CL4AC: 音频字幕的矛盾损失 (CL4AC: A Contrastive Loss for Audio Captioning) - 专知论文

会员服务 ·

0

contrastive · AAC · Automator · Performer · 损失 ·

2021 年 11 月 22 日

CL4AC: A Contrastive Loss for Audio Captioning

翻译：CL4AC: 音频字幕的矛盾损失

Xubo Liu,Qiushi Huang,Xinhao Mei,Tom Ko,H Lilian Tang,Mark D. Plumbley,Wenwu Wang

from arxiv, The first two authors contributed equally, 5 pages, 3 figures, accepted by DCASE2021 Workshop

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip. As shown in the submissions received for Task 6 of the DCASE 2021 Challenges, this problem has received increasing interest in the community. The existing AAC systems are usually based on an encoder-decoder architecture, where the audio signal is encoded into a latent representation, and aligned with its corresponding text descriptions, then a decoder is used to generate the captions. However, training of an AAC system often encounters the problem of data scarcity, which may lead to inaccurate representation and audio-text alignment. To address this problem, we propose a novel encoder-decoder framework called Contrastive Loss for Audio Captioning (CL4AC). In CL4AC, the self-supervision signals derived from the original audio-text paired data are used to exploit the correspondences between audio and texts by contrasting samples, which can improve the quality of latent representation and the alignment between audio and texts, while trained with limited data. Experiments are performed on the Clotho dataset to show the effectiveness of our proposed approach.

翻译：自动音频字幕(AAC)是一项跨模式翻译任务,目的是使用自然语言描述音频剪辑的内容。正如为DCASE 2021挑战第6号任务收到的提交材料所示,这一问题在社区中受到越来越多的关注。现有的AAC系统通常基于编码器脱代码器结构,其中音频信号编码成潜代号,并与相应的文本说明相一致,然后使用解码器生成字幕。然而,AAC系统的培训常常遇到数据稀缺问题,这可能导致陈述不准确和音频文本协调。为了解决这一问题,我们提议了一个名为“声音捕获的对比丢失”的新的编码解码器框架。在CL4AC中,从原始的音频文本配对数据中产生的自我监督信号被用于通过对比样本来利用音频和文本之间的对应,这可以提高潜在表述的质量,使音频和文本之间的一致性得到改进,同时用有限的数据加以培训。在Clotho数据上进行了实验,以显示我们拟议的数据的有效性。

0

相关内容

contrastive

【ACMMM2021】问题控制的文本感知图像描述生成

专知会员服务

19+阅读 · 2021年9月23日

近期必读的5篇顶会CVPR 2021【图像/视频描述生成】相关论文和代码

专知会员服务

48+阅读 · 2021年4月25日

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知会员服务

29+阅读 · 2021年4月14日

【CVPR2021】空间一致性表示学习

专知会员服务

63+阅读 · 2021年3月12日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【ACM Multimedia 2019 Tutorial】机器学习音频和多媒体数据的再现性和实验设计（Reproducibility and Experimental Design for Machine Learning on Audio and Multimedia Data），Gerald Friedland

【ACM Multimedia 2019 Tutorial】机器学习音频和多媒体数据的再现性和实验设计（Reproducibility and Experimental Design for Machine Learning on Audio and Multimedia Data），Gerald Friedland

专知会员服务

5+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

文字描述生成视频的开源项目

文字描述生成视频的开源项目

CreateAMind

5+阅读 · 2017年12月31日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Open-book Video Captioning with Retrieve-Copy-Generate Network

Arxiv

7+阅读 · 2021年3月9日

Semantic Grouping Network for Video Captioning

Arxiv

8+阅读 · 2021年2月1日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Image Captioning through Image Transformer

Arxiv

3+阅读 · 2020年4月29日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Arxiv

4+阅读 · 2018年11月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Controllable Generative Adversarial Network

Arxiv

5+阅读 · 2018年5月1日

VIP会员

文章信息

相关主题

相关VIP内容

【ACMMM2021】问题控制的文本感知图像描述生成

专知会员服务

19+阅读 · 2021年9月23日

近期必读的5篇顶会CVPR 2021【图像/视频描述生成】相关论文和代码

专知会员服务

48+阅读 · 2021年4月25日

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知会员服务

29+阅读 · 2021年4月14日

【CVPR2021】空间一致性表示学习

专知会员服务

63+阅读 · 2021年3月12日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【ACM Multimedia 2019 Tutorial】机器学习音频和多媒体数据的再现性和实验设计（Reproducibility and Experimental Design for Machine Learning on Audio and Multimedia Data），Gerald Friedland

【ACM Multimedia 2019 Tutorial】机器学习音频和多媒体数据的再现性和实验设计（Reproducibility and Experimental Design for Machine Learning on Audio and Multimedia Data），Gerald Friedland

专知会员服务

5+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

基于大型语言模型的网络威胁情报：利用LLM提取MITRE ATT&CK技术 | 最新文献

无人机（UAV）战略：区域大国与暴力非国家行为体在中东冲突中对无人机的运用 | 130页

神经技术与未来无人机战争的交汇点 | 最新报告

美国从“蛛网行动”中汲取轰炸机舰队保护教训

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

文字描述生成视频的开源项目

文字描述生成视频的开源项目

CreateAMind

5+阅读 · 2017年12月31日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

相关论文

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Open-book Video Captioning with Retrieve-Copy-Generate Network

Arxiv

7+阅读 · 2021年3月9日

Semantic Grouping Network for Video Captioning

Arxiv

8+阅读 · 2021年2月1日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Image Captioning through Image Transformer

Arxiv

3+阅读 · 2020年4月29日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Arxiv

4+阅读 · 2018年11月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Controllable Generative Adversarial Network

Arxiv

5+阅读 · 2018年5月1日

微信扫码咨询专知VIP会员