Everyday sound recognition aims to infer types of sound events in audio streams. While many works succeeded in training models with high performance in a fully-supervised manner, they are still restricted to the demand of large quantities of labelled data and the range of predefined classes. To overcome these drawbacks, this work firstly curates a new database named FSD-FS for multi-label few-shot audio classification. It then explores how to incorporate audio taxonomy in few-shot learning. Specifically, this work proposes label-dependent prototypical networks (LaD-protonet) to exploit parent-children relationships between labels. Plus, it applies taxonomy-aware label smoothing techniques to boost model performance. Experiments demonstrate that LaD-protonet outperforms original prototypical networks as well as other state-of-the-art methods. Moreover, its performance can be further boosted when combined with taxonomy-aware label smoothing.
翻译:每天声音识别的目的是推断音频流中的音频事件类型。 许多作品成功地以完全监督的方式对高性能模型进行了培训,但它们仍然局限于大量标签数据的需求和预先定义的类别范围。 为了克服这些缺陷, 这项工作首先为多标签几发音频分类建立一个名为 FSD- FS 的新数据库。 然后探索如何将音频分类纳入微小的学习中。 具体地说, 这项工作提议采用基于标签的原型网络( LaD- protonet ) 来利用标签之间的亲子关系。 此外, 它应用分类学认知标签平滑技术来提升模型性能。 实验显示, LaD- protonet 超越了原始原原型网络和其他状态艺术方法。 此外, 如果与分类学- 特征平滑的标签相结合, 其性能还可以进一步提升。