分析真实性噪音标签数据噪音模型错误 (Analysing the Noise Model Error for Realistic Noisy Label Data) - 专知论文

会员服务 ·

0

噪声 · 标注 · MoDELS · 估计/估计量 · 期望错误 ·

2021 年 1 月 24 日

Analysing the Noise Model Error for Realistic Noisy Label Data

翻译：分析真实性噪音标签数据噪音模型错误

Michael A. Hedderich,Dawei Zhu,Dietrich Klakow

from arxiv, Accepted at AAAI 2021, additional material at https://github.com/uds-lsv/noise-estimation

Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these noisy labels is noise modelling where the underlying noise process is modelled. In this work, we study the quality of these estimated noise models from the theoretical side by deriving the expected error of the noise model. Apart from evaluating the theoretical results on commonly used synthetic noise, we also publish NoisyNER, a new noisy label dataset from the NLP domain that was obtained through a realistic distant supervision technique. It provides seven sets of labels with differing noise patterns to evaluate different noise levels on the same instances. Parallel, clean labels are available making it possible to study scenarios where a small amount of gold-standard data can be leveraged. Our theoretical results and the corresponding experiments give insights into the factors that influence the noise model estimation like the noise distribution and the sampling technique.

翻译：在这项工作中,我们通过推断噪音模型的预期错误,从理论方面研究这些估计噪音模型的质量。除了评估常用合成噪音的理论结果外,我们还出版了NosyNER(NLP域的一个新的噪音标签数据集),该数据集是通过现实的远程监督技术获得的。它提供了七套具有不同噪音模式的标签,以评价同一实例的不同噪音水平。平行的、清洁的标签使得有可能研究能够利用少量黄金标准数据的设想。我们的理论结果和相应的实验揭示了影响噪音模型估计的因素,例如噪音分布和取样技术。

0

相关内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

111+阅读 · 2020年11月12日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

6+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Program Synthesis Over Noisy Data with Guarantees

Arxiv

0+阅读 · 2021年3月20日

Backfitting for large scale crossed random effects regressions

Arxiv

0+阅读 · 2021年3月18日

The effect of Informative Selection on the estimation of parameters related to Spatial Processes

Arxiv

0+阅读 · 2021年3月18日

Toward Better Use of Data in Linear Bandits

Arxiv

0+阅读 · 2021年3月18日

A Probabilistic State Space Model for Joint Inference from Differential Equations and Data

Arxiv

1+阅读 · 2021年3月18日

Estimating FARIMA models with uncorrelated but non-independent error terms

Arxiv

0+阅读 · 2021年3月18日

Probabilistic Simplex Component Analysis

Arxiv

0+阅读 · 2021年3月18日

Probabilistic Logic Neural Networks for Reasoning

Arxiv

7+阅读 · 2019年6月20日

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Arxiv

6+阅读 · 2018年7月5日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

111+阅读 · 2020年11月12日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《全谱战争——从拓宽工具到思考不可思考之事》

《FPV武装无人机的战斗飞行艺术与科学》最新报告

无人机作战：演进、创新与未来战场

《反无人机：用于无人机探测与定位的多输入多输出雷达》最新69页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

6+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

相关论文

Program Synthesis Over Noisy Data with Guarantees

Arxiv

0+阅读 · 2021年3月20日

Backfitting for large scale crossed random effects regressions

Arxiv

0+阅读 · 2021年3月18日

The effect of Informative Selection on the estimation of parameters related to Spatial Processes

Arxiv

0+阅读 · 2021年3月18日

Toward Better Use of Data in Linear Bandits

Arxiv

0+阅读 · 2021年3月18日

A Probabilistic State Space Model for Joint Inference from Differential Equations and Data

Arxiv

1+阅读 · 2021年3月18日

Estimating FARIMA models with uncorrelated but non-independent error terms

Arxiv

0+阅读 · 2021年3月18日

Probabilistic Simplex Component Analysis

Arxiv

0+阅读 · 2021年3月18日

Probabilistic Logic Neural Networks for Reasoning

Arxiv

7+阅读 · 2019年6月20日

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Arxiv

6+阅读 · 2018年7月5日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员