使用边边际排名 " 区域 " 识别误贴标签数据 (Identifying Mislabeled Data using the Area Under the Margin Ranking) - 专知论文

会员服务 ·

0

可辨认的 · 边缘化 · 秩 · 测试误差 · 样本 ·

2020 年 12 月 3 日

Identifying Mislabeled Data using the Area Under the Margin Ranking

翻译：使用边边际排名 " 区域 " 识别误贴标签数据

Geoff Pleiss,Tianyi Zhang,Ethan R. Elenberg,Kilian Q. Weinberger

from arxiv, NeurIPS 2020

Not all data in a typical training set help with generalization; some samples can be overly ambiguous or outrightly mislabeled. This paper introduces a new method to identify such samples and mitigate their impact when training neural networks. At the heart of our algorithm is the Area Under the Margin (AUM) statistic, which exploits differences in the training dynamics of clean and mislabeled samples. A simple procedure - adding an extra class populated with purposefully mislabeled threshold samples - learns a AUM upper bound that isolates mislabeled data. This approach consistently improves upon prior work on synthetic and real-world datasets. On the WebVision50 classification task our method removes 17% of training data, yielding a 1.6% (absolute) improvement in test error. On CIFAR100 removing 13% of the data leads to a 1.2% drop in error.

翻译：典型的培训数据集中并非所有数据都有助于概括化; 一些样本可能过于模糊或明显贴错标签。本文在培训神经网络时引入了一种新的方法来识别这些样本并减轻其影响。我们的算法核心是“ 边际区域( AUM) ” 统计, 该统计利用了清洁和贴错标签样本培训动态的差异。一个简单程序 — — 添加一个额外类别,其中含有故意贴错标签的阈值样本 — — 学习一个AUM上限,其中分离了错误标签的数据。这种方法在以前关于合成和真实世界数据集的工作上不断改进。在WebVision50分类任务中,我们的方法删除了17%的培训数据,导致测试错误的1.6%( 绝对) 改进。在CIFAR100中, 13%的数据去除导致1.2%的误差。

0

相关内容

可辨认的

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

专知会员服务

16+阅读 · 2019年10月2日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

5+阅读 · 2017年8月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions

Arxiv

0+阅读 · 2021年1月22日

Asymmetric Loss For Multi-Label Classification

Arxiv

6+阅读 · 2020年9月29日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Continual Unsupervised Representation Learning

Continual Unsupervised Representation Learning

Arxiv

7+阅读 · 2019年10月31日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Few Shot Learning with Simplex

Few Shot Learning with Simplex

Arxiv

5+阅读 · 2018年7月27日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

Active Metric Learning for Supervised Classification

Arxiv

9+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

专知会员服务

16+阅读 · 2019年10月2日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

5+阅读 · 2017年8月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions

Arxiv

0+阅读 · 2021年1月22日

Asymmetric Loss For Multi-Label Classification

Arxiv

6+阅读 · 2020年9月29日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Continual Unsupervised Representation Learning

Continual Unsupervised Representation Learning

Arxiv

7+阅读 · 2019年10月31日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Few Shot Learning with Simplex

Few Shot Learning with Simplex

Arxiv

5+阅读 · 2018年7月27日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

Active Metric Learning for Supervised Classification

Arxiv

9+阅读 · 2018年3月28日

微信扫码咨询专知VIP会员