FSD50K: 人类与众人共享声音活动的开放数据集 (FSD50K: An Open Dataset of Human-Labeled Sound Events)

from arxiv, Accepted version in TASLP. Main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. https://ieeexplore.ieee.org/document/9645159

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes. However, AudioSet is not an open dataset as its official release consists of pre-computed audio features. Downloading the original audio tracks can be problematic due to YouTube videos gradually disappearing and usage rights issues. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research.

翻译：用于声音事件识别(SER)的大多数现有数据集相对较小和/或具体领域,但AudioSet除外,它基于YouTube视频的2M音轨,包含500多个音频类。然而,AudioSet不是一个开放的数据集,因为其正式发布由预先配置的音频功能组成。由于YouTube视频逐渐消失和使用权利问题,下载原始音频跟踪可能会有问题。为了提供一个替代基准数据集,从而促进SER研究,我们引入FSD50K,这是一个开放数据集,包含51k以上音频视频剪辑,总共100多小时,由AudioSet Ontology所抽取的200个音频手动标签。这些音频剪是根据CreativeCommons许可证获得许可的,使数据设置自由分配(包括波形)的。我们详细描述了FSD50K创建过程,这是针对Freesound数据的特殊性,包括遇到的挑战和采用的解决办法。我们包括全面的数据集描述,同时讨论限制和关键因素,以便其音频知情使用。最后,我们进行音频事件分类实验,以提供基线系统实验,以提供核心数据,以便根据主要因素对目标进行深入了解。当我们制定新的数据时,以便将数据进行。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日