概率数据库中基于自动编码器的清理 (Autoencoder-based cleaning in probabilistic databases) - 专知论文

会员服务 ·

0

Integration · 可辨认的 · INFORMS · 自编码器 · Performance ·

2021 年 6 月 17 日

Autoencoder-based cleaning in probabilistic databases

翻译：概率数据库中基于自动编码器的清理

R. R. Mauritz,F. P. J. Nijweide,J. Goseling,M. van Keulen

from arxiv, Submitted to ACM Journal of Data and Information Quality, Special Issue on Deep Learning for Data Quality

In the field of data integration, data quality problems are often encountered when extracting, combining, and merging data. The probabilistic data integration approach represents information about such problems as uncertainties in a probabilistic database. In this paper, we propose a data-cleaning autoencoder capable of near-automatic data quality improvement. It learns the structure and dependencies in the data to identify and correct doubtful values. A theoretical framework is provided, and experiments show that it can remove significant amounts of noise from categorical and numeric probabilistic data. Our method does not require clean data. We do, however, show that manually cleaning a small fraction of the data significantly improves performance.

翻译：在数据整合领域,在提取、合并和合并数据时往往会遇到数据质量问题。概率数据整合方法代表了概率数据库不确定性等问题的信息。在本文件中,我们提出一个能够近自动数据质量改进的数据清理自动编码器。它了解数据的结构和依赖性,以便识别和纠正可疑值。提供了理论框架,实验表明它能够从绝对和数字概率数据中消除大量噪音。我们的方法不需要清洁数据。然而,我们确实表明,人工清理数据中的一小部分能显著改善性能。

0

相关内容

Integration

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI杂志。 Publisher：Elsevier。 SIT：http://dblp.uni-trier.de/db/journals/integration/

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

275+阅读 · 2019年10月25日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

专知会员服务

46+阅读 · 2019年8月10日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Plug and Play, Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2021年8月20日

Listening to the city, attentively: A Spatio-Temporal Attention Boosted Autoencoder for the Short-Term Flow Prediction Problem

Listening to the city, attentively: A Spatio-Temporal Attention Boosted Autoencoder for the Short-Term Flow Prediction Problem

Arxiv

0+阅读 · 2021年8月18日

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

Arxiv

4+阅读 · 2020年2月4日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Learning Deep Representations with Probabilistic Knowledge Transfer

Arxiv

5+阅读 · 2019年3月20日

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Arxiv

4+阅读 · 2018年7月12日

Inference in Probabilistic Graphical Models by Graph Neural Networks

Arxiv

3+阅读 · 2018年5月25日

Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures

Arxiv

6+阅读 · 2018年5月17日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

TensorLog: Deep Learning Meets Probabilistic DBs

Arxiv

6+阅读 · 2017年7月17日

VIP会员

文章信息

相关主题

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

275+阅读 · 2019年10月25日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

专知会员服务

46+阅读 · 2019年8月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Plug and Play, Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2021年8月20日

Listening to the city, attentively: A Spatio-Temporal Attention Boosted Autoencoder for the Short-Term Flow Prediction Problem

Listening to the city, attentively: A Spatio-Temporal Attention Boosted Autoencoder for the Short-Term Flow Prediction Problem

Arxiv

0+阅读 · 2021年8月18日

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

Arxiv

4+阅读 · 2020年2月4日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Learning Deep Representations with Probabilistic Knowledge Transfer

Arxiv

5+阅读 · 2019年3月20日

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Arxiv

4+阅读 · 2018年7月12日

Inference in Probabilistic Graphical Models by Graph Neural Networks

Arxiv

3+阅读 · 2018年5月25日

Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures

Arxiv

6+阅读 · 2018年5月17日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

TensorLog: Deep Learning Meets Probabilistic DBs

Arxiv

6+阅读 · 2017年7月17日

微信扫码咨询专知VIP会员