ARCA23K:用于调查开放式标签噪音的音频数据集 (ARCA23K: An audio dataset for investigating open-set label noise) - 专知论文

会员服务 ·

0

噪声 · 标注 · 数据集 · Performer · Processing（编程语言） ·

2021 年 9 月 19 日

ARCA23K: An audio dataset for investigating open-set label noise

翻译：ARCA23K:用于调查开放式标签噪音的音频数据集

Turab Iqbal,Yin Cao,Andrew Bailey,Mark D. Plumbley,Wenwu Wang

from arxiv, Accepted to the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)

The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often prevalent in such datasets requires further investigation. This paper introduces ARCA23K, an Automatically Retrieved and Curated Audio dataset comprised of over 23000 labelled Freesound clips. Unlike past datasets such as FSDKaggle2018 and FSDnoisy18K, ARCA23K facilitates the study of label noise in a more controlled manner. We describe the entire process of creating the dataset such that it is fully reproducible, meaning researchers can extend our work with little effort. We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise. Experiments are carried out in which we study the impact of label noise in terms of classification performance and representation learning.

翻译：在Freesound等声音共享平台上提供的音频数据使用户能够获取大量附加说明的音频数据。将这类数据用于培训正在变得日益普及,但这类数据集中通常普遍存在的标签噪音问题需要进一步调查。本文介绍ARCAC23K,这是一个自动检索和缩小的音频数据集,由23 000多个贴有标签的音频剪辑组成。与FSDKaggle2018和FSDnoisy18K等过去的数据集不同,ARCAC23K促进以更受控制的方式研究标签噪音。我们描述了创建数据集的整个过程,以便完全可以复制,这意味着研究人员可以不费力地延长我们的工作。我们表明,ARCC23K的大多数标签误差是校外音短片,我们把这类标签噪音称为开立标签噪音。进行实验是为了研究标签噪音在分类表现和代表性学习方面的影响。

0

相关内容

【干货书】计算成像，483页pdf，Computational Imaging Book, MIT 出版社

专知会员服务

66+阅读 · 2021年9月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Moving Object Detection for Event-based vision using Graph Spectral Clustering

Arxiv

0+阅读 · 2021年11月9日

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Arxiv

0+阅读 · 2021年11月7日

Dynamic Data Augmentation with Gating Networks

Arxiv

0+阅读 · 2021年11月5日

A Theory of Label Propagation for Subpopulation Shift

Arxiv

7+阅读 · 2021年2月22日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Arxiv

6+阅读 · 2019年2月26日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Topic Modelling of Everyday Sexism Project Entries

Arxiv

3+阅读 · 2018年4月5日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【干货书】计算成像，483页pdf，Computational Imaging Book, MIT 出版社

专知会员服务

66+阅读 · 2021年9月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用射频传感器载荷增强无人机的侦察、监视与目标获取（ISR）能力》报告

《导航战》2025最新报告

人工智能驱动的国防战术通信与网络：提升现代战争中的态势感知、安全性与自主决策 | 万字长文

《有人-无人轻型驱逐舰与中型无人水面艇支队在第二与第一岛链作战中的部署概念（CONOPS）》56页报告

相关资讯

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Moving Object Detection for Event-based vision using Graph Spectral Clustering

Arxiv

0+阅读 · 2021年11月9日

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Arxiv

0+阅读 · 2021年11月7日

Dynamic Data Augmentation with Gating Networks

Arxiv

0+阅读 · 2021年11月5日

A Theory of Label Propagation for Subpopulation Shift

Arxiv

7+阅读 · 2021年2月22日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Arxiv

6+阅读 · 2019年2月26日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Topic Modelling of Everyday Sexism Project Entries

Arxiv

3+阅读 · 2018年4月5日

微信扫码咨询专知VIP会员