自动音频说明:近期进展和新挑战概览 (Automated Audio Captioning: an Overview of Recent Progress and New Challenges)

Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips. This task has received increasing attention with the release of freely available datasets in recent years. The problem has been addressed predominantly with deep learning techniques. Numerous approaches have been proposed, such as investigating different neural network architectures, exploiting auxiliary information such as keywords or sentence information to guide caption generation, and employing different training strategies, which have greatly facilitated the development of this field. In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets. Moreover, we discuss open challenges and envisage possible future research directions.

翻译：自动音频字幕是一项跨模式的翻译任务,旨在为特定音频剪辑制作自然语言描述,近年来,随着免费提供数据集的发布,这项任务日益受到重视,该问题主要通过深层学习技术得到解决,提出了许多办法,例如调查不同的神经网络结构,利用关键词或句子信息等辅助信息指导字幕生成,以及采用各种培训战略,大大促进了该领域的发展。我们在本文件中全面审查了从现有各种评价指标和数据集的方法来看,在自动音频字幕中发表的贡献。此外,我们讨论了公开的挑战,并设想了今后可能的研究方向。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日