Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which aims at developing computational systems to model relationships between audio signals and free-form textual descriptions. Compared with audio captioning (Subtask 6A), which is about generating audio captions for audio signals, language-based audio retrieval (Subtask 6B) focuses on ranking audio signals according to their relevance to natural language textual captions. In DCASE 2022 Challenge, the provided baseline system for Subtask 6B was significantly outperformed, with top performance being 0.276 in mAP@10. This paper presents the outcome of Subtask 6B in terms of submitted systems' performance and analysis.
翻译:以语言为基础的音频检索是一项任务,其中自然语言文本字幕被用作查询从数据集中检索音频信号的查询工具,它首先作为任务6的子任务6B引入了DCASE 2022 挑战,作为任务6的子任务6B,目的是开发计算系统,以模拟音频信号和自由形式文字描述之间的关系。与音频字幕(Subtask 6A)相比,这是为音频信号制作音频字幕,基于语言的音频检索(Subtask 6B)侧重于根据与自然语言文本字幕的相关性排列音频信号的顺序。在DCASE 2022 挑战中,为 Subtask 6B提供的基线系统明显地超过功能,最大性能为 mAP@10中的0.276。本文介绍了Subtask 6B在提交的系统性能和分析方面的结果。