自动编码器数据清理的概率数据库方法 (A probabilistic database approach to autoencoder-based data cleaning) - 专知论文

会员服务 ·

0

可辨认的 · 属性值 · 自编码器 · Performance · 噪声 ·

2021 年 8 月 3 日

A probabilistic database approach to autoencoder-based data cleaning

翻译：自动编码器数据清理的概率数据库方法

R. R. Mauritz,F. P. J. Nijweide,J. Goseling,M. van Keulen

from arxiv, Submitted to ACM Journal of Data and Information Quality, Special Issue on Deep Learning for Data Quality. Updated with changes made during peer-review process

Data quality problems are a large threat in data science. In this paper, we propose a data-cleaning autoencoder capable of near-automatic data quality improvement. It learns the structure and dependencies in the data and uses it as evidence to identify and correct doubtful values. We apply a probabilistic database approach to represent weak and strong evidence for attribute value repairs. A theoretical framework is provided, and experiments show that it can remove significant amounts of noise (i.e., data quality problems) from categorical and numeric probabilistic data. Our method does not require clean data. We do, however, show that manually cleaning a small fraction of the data significantly improves performance.

翻译：数据质量问题是数据科学的一大威胁。在本文中,我们建议建立一个能够近自动数据质量改进的数据清理自动编码器。它学习数据的结构和依赖性,并将其作为证据来识别和纠正可疑值。我们采用概率数据库方法来代表薄弱和有力的属性价值修复证据。我们提供了理论框架,实验表明它能够从绝对和数字概率数据中去除大量噪音(即数据质量问题)。我们的方法不需要清洁数据。但我们确实表明,人工清理数据中的一小部分能显著改善性能。

0

相关内容

可辨认的

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

123+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

【NLP| 推荐文章】知识图谱问答系统的神经网络方法介绍（Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs）

专知会员服务

59+阅读 · 2019年11月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

13+阅读 · 2019年4月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Probabilistic Robust Autoencoders for Anomaly Detection

Probabilistic Robust Autoencoders for Anomaly Detection

Arxiv

1+阅读 · 2021年10月1日

Coo: Rethink Data Anomalies In Databases

Coo: Rethink Data Anomalies In Databases

Arxiv

0+阅读 · 2021年10月1日

Evaluation of statistical approaches for association testing in noisy drug screening data

Arxiv

0+阅读 · 2021年9月30日

Private sampling: a noiseless approach for generating differentially private synthetic data

Arxiv

0+阅读 · 2021年9月30日

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

Arxiv

4+阅读 · 2020年2月4日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Inference in Probabilistic Graphical Models by Graph Neural Networks

Arxiv

3+阅读 · 2018年5月25日

TVAE: Triplet-Based Variational Autoencoder using Metric Learning

Arxiv

3+阅读 · 2018年4月3日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

TensorLog: Deep Learning Meets Probabilistic DBs

Arxiv

6+阅读 · 2017年7月17日

VIP会员

文章信息

相关主题

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

123+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

【NLP| 推荐文章】知识图谱问答系统的神经网络方法介绍（Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs）

专知会员服务

59+阅读 · 2019年11月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

13+阅读 · 2019年4月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Probabilistic Robust Autoencoders for Anomaly Detection

Probabilistic Robust Autoencoders for Anomaly Detection

Arxiv

1+阅读 · 2021年10月1日

Coo: Rethink Data Anomalies In Databases

Coo: Rethink Data Anomalies In Databases

Arxiv

0+阅读 · 2021年10月1日

Evaluation of statistical approaches for association testing in noisy drug screening data

Arxiv

0+阅读 · 2021年9月30日

Private sampling: a noiseless approach for generating differentially private synthetic data

Arxiv

0+阅读 · 2021年9月30日

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

Arxiv

4+阅读 · 2020年2月4日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Inference in Probabilistic Graphical Models by Graph Neural Networks

Arxiv

3+阅读 · 2018年5月25日

TVAE: Triplet-Based Variational Autoencoder using Metric Learning

Arxiv

3+阅读 · 2018年4月3日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

TensorLog: Deep Learning Meets Probabilistic DBs

Arxiv

6+阅读 · 2017年7月17日

微信扫码咨询专知VIP会员