自信学习:估算数据集标签中的不确定性 (Confident Learning: Estimating Uncertainty in Dataset Labels) - 专知论文

会员服务 ·

0

估计/估计量 · Learning · 置信度 · 标注 · Principle ·

2022 年 8 月 22 日

Confident Learning: Estimating Uncertainty in Dataset Labels

翻译：自信学习:估算数据集标签中的不确定性

Curtis G. Northcutt,Lu Jiang,Isaac L. Chuang

from arxiv, Published in Journal of Artificial Intelligence Research (JAIR)

Learning exists in the context of data, yet notions of confidence typically focus on model predictions, not label quality. Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. Whereas numerous studies have developed these principles independently, here, we combine them, building on the assumption of a class-conditional noise process to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels. This results in a generalized CL which is provably consistent and experimentally performant. We present sufficient conditions where CL exactly finds label errors, and show CL performance exceeding seven recent competitive approaches for learning with noisy labels on the CIFAR dataset. Uniquely, the CL framework is not coupled to a specific data modality or model (e.g., we use CL to find several label errors in the presumed error-free MNIST dataset and improve sentiment classification on text data in Amazon Reviews). We also employ CL on ImageNet to quantify ontological class overlap (e.g., estimating 645 "missile" images are mislabeled as their parent class "projectile"), and moderately increase model accuracy (e.g., for ResNet) by cleaning data prior to training. These results are replicable using the open-source cleanlab release.

翻译：在数据背景下存在学习,但信任概念通常侧重于模型预测,而不是标签质量。自信学习(CL)是一种替代方法,其重点是标签质量,其依据的原则是运行噪音数据,用概率阈值计以估计噪音,用信心培训范例排列。虽然许多研究独立地发展了这些原则,但在此,我们结合了这些原则,其依据是假设一个等级条件噪音过程,直接估计噪音(given)标签和无干扰(未知)标签之间的联合分布。这导致一个通用的 CL, 其特征化和识别数据集中的标签错误, 其依据的原理是: 浏览噪音数据, 以精确度为基础, 并用概率值计算; 显示 CL的功能超过最近七种竞争性的学习方法, 在 CFAR 数据集中, 奇怪的是, CL 框架与特定的数据模式或模型(例如, 我们使用 CL 来直接估计无误的 MNIST 数据设置和无误(未知的) 标签, 并改进文本数据的准确性分类, 在亚马逊级中, 的分类中, 我们用CL 将数据升级为。

1

相关内容

估计/估计量

估计/估计量

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

海洋天然产物Lamellarin D糖基化衍生物的合成与构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

以离子液体为溶剂的纤维素/蛋白质共溶解与纺丝成形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

抗癌症干细胞天然产物Rakicidin A的合成及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

铋的不同价态对铋酸钡基超导化合物性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

新型稀土金属硼杂苯化合物化学

国家自然科学基金

0+阅读 · 2012年12月31日

具有生物活性含氮杂环化合物的新合成方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型多元铟硫属化合物的溶剂热合成及性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

由Lé过程驱动的随机时滞偏微分方程的研究

国家自然科学基金

0+阅读 · 2009年12月31日

高炭产率多功能聚荧蒽的化学氧化合成

国家自然科学基金

0+阅读 · 2008年12月31日

Uncertainty-Aware Meta-Learning for Multimodal Task Distributions

Arxiv

0+阅读 · 2022年10月4日

Generating Synthetic Data with The Nearest Neighbors Algorithm

Arxiv

0+阅读 · 2022年10月3日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2022年10月3日

The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Arxiv

0+阅读 · 2022年10月2日

Fine-grained Contrastive Learning for Definition Generation

Arxiv

0+阅读 · 2022年10月2日

End-to-End Label Uncertainty Modeling in Speech Emotion Recognition using Bayesian Neural Networks and Label Distribution Learning

Arxiv

1+阅读 · 2022年9月30日

Evaluation of Medical Image Segmentation Models for Uncertain, Small or Empty Reference Annotations

Arxiv

0+阅读 · 2022年9月30日

Few-shot Learning with Noisy Labels

Arxiv

13+阅读 · 2022年4月12日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

相关论文

Uncertainty-Aware Meta-Learning for Multimodal Task Distributions

Arxiv

0+阅读 · 2022年10月4日

Generating Synthetic Data with The Nearest Neighbors Algorithm

Arxiv

0+阅读 · 2022年10月3日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2022年10月3日

The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Arxiv

0+阅读 · 2022年10月2日

Fine-grained Contrastive Learning for Definition Generation

Arxiv

0+阅读 · 2022年10月2日

End-to-End Label Uncertainty Modeling in Speech Emotion Recognition using Bayesian Neural Networks and Label Distribution Learning

Arxiv

1+阅读 · 2022年9月30日

Evaluation of Medical Image Segmentation Models for Uncertain, Small or Empty Reference Annotations

Arxiv

0+阅读 · 2022年9月30日

Few-shot Learning with Noisy Labels

Arxiv

13+阅读 · 2022年4月12日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

相关基金

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

海洋天然产物Lamellarin D糖基化衍生物的合成与构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

以离子液体为溶剂的纤维素/蛋白质共溶解与纺丝成形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

抗癌症干细胞天然产物Rakicidin A的合成及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

铋的不同价态对铋酸钡基超导化合物性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

新型稀土金属硼杂苯化合物化学

国家自然科学基金

0+阅读 · 2012年12月31日

具有生物活性含氮杂环化合物的新合成方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型多元铟硫属化合物的溶剂热合成及性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

由Lé过程驱动的随机时滞偏微分方程的研究

国家自然科学基金

0+阅读 · 2009年12月31日

高炭产率多功能聚荧蒽的化学氧化合成

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员