The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often prevalent in such datasets requires further investigation. This paper introduces ARCA23K, an Automatically Retrieved and Curated Audio dataset comprised of over 23000 labelled Freesound clips. Unlike past datasets such as FSDKaggle2018 and FSDnoisy18K, ARCA23K facilitates the study of label noise in a more controlled manner. We describe the entire process of creating the dataset such that it is fully reproducible, meaning researchers can extend our work with little effort. We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise. Experiments are carried out in which we study the impact of label noise in terms of classification performance and representation learning.
翻译:在Freesound等声音共享平台上提供的音频数据使用户能够获取大量附加说明的音频数据。 将这类数据用于培训正在变得日益普及,但这类数据集中通常普遍存在的标签噪音问题需要进一步调查。 本文介绍ARCAC23K,这是一个自动检索和缩小的音频数据集,由23 000多个贴有标签的音频剪辑组成。 与FSDKaggle2018和FSDnoisy18K等过去的数据集不同,ARCAC23K促进以更受控制的方式研究标签噪音。 我们描述了创建数据集的整个过程,以便完全可以复制,这意味着研究人员可以不费力地延长我们的工作。 我们表明,ARCC23K的大多数标签误差是校外音短片,我们把这类标签噪音称为开立标签噪音。 进行实验是为了研究标签噪音在分类表现和代表性学习方面的影响。