使用音频事件 Clues 自动音频控制 (Automated Audio Captioning using Audio Event Clues)

Audio captioning is an important research area that aims to generate meaningful descriptions for audio clips. Most of the existing research extracts acoustic features of audio clips as input to encoder-decoder and transformer architectures to produce the captions in a sequence-to-sequence manner. Due to data insufficiency and the architecture's inadequate learning capacity, additional information is needed to generate natural language sentences, as well as acoustic features. To address these problems, an encoder-decoder architecture is proposed that learns from both acoustic features and extracted audio event labels as inputs. The proposed model is based on pre-trained acoustic features and audio event detection. Various experiments used different acoustic features, word embedding models, audio event label extraction methods, and implementation configurations to show which combinations have better performance on the audio captioning task. Results of the extensive experiments on multiple datasets show that using audio event labels with the acoustic features improves the recognition performance and the proposed method either outperforms or achieves competitive results with the state-of-the-art models.

翻译：音频字幕是一个重要的研究领域,目的是为音效剪辑提供有意义的描述。大多数现有研究提取音频剪辑的声学特征,作为编码器解码器和变压器结构的投入,以顺序顺序顺序方式制作字幕。由于数据不足和结构学习能力不足,需要额外信息来生成自然语言句子和声学特征。为解决这些问题,提议了一个编码解码器结构,既学习音频特征,又提取音频事件标签作为投入。提议的模型以预先训练的音频特征和音频事件探测为基础。各种实验使用了不同的声学特征、文字嵌入模型、音频事件标签提取方法和实施配置,以显示哪些组合在音频字幕任务上表现更好。对多个数据集的广泛实验结果表明,使用音频事件标签和声学特征可以提高识别性,而拟议方法则可以超越或取得与最先进的模型相比的竞争性结果。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日