与本体功能依赖性一起清理发现和背景数据 (Discovery and Contextual Data Cleaning with Ontology Functional Dependencies) - 专知论文

会员服务 ·

0

泛函 · 可约的 · 假阳性 · 剪枝 · 确切的 ·

2021 年 5 月 24 日

Discovery and Contextual Data Cleaning with Ontology Functional Dependencies

翻译：与本体功能依赖性一起清理发现和背景数据

Zheng Zheng,Longtao Zheng,Fei Chiang,Lukasz Golab,Jaroslaw Szlichta

Functional Dependencies (FDs) define attribute relationships based on syntactic equality, and, when usedin data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies(OFDs), which express semantic attribute relationships such as synonyms and is-a hierarchies defined by an ontology. We study the theoretical foundations for OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Towards enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets, and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional FDs.

翻译：功能依赖(FDs) 定义基于同系物平等的属性关系,在使用数据清理时,它们错误地将同系物不同但等同的值贴上错误的标签。我们探索与本体功能依赖性(OFDs)一起进行基于依赖性的数据清理,以表达同义词和本体学界定的等级等语义属性关系;我们研究D的理论基础,包括健全和完整的轴数和线性时间推论程序。我们然后提出一种算法,用于发现数据(除某些例外情况外,持有的)与使用等义词来提取搜索空间的数据的相容性数据。我们设法使OFDs成为实践中的数据质量规则,我们研究找到对一系列ODs关系和理论进行最起码的修复的问题。我们展示了我们在真实数据集方面的技术的有效性,并表明ODs可以大大减少依赖传统FDs的数据清理技术中错误的数量。

0

相关内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

111+阅读 · 2020年11月12日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

专知会员服务

26+阅读 · 2019年11月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

已删除

将门创投

7+阅读 · 2018年12月12日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Bottom-up Synthesis of Recursive Functional Programs using Angelic Execution

Bottom-up Synthesis of Recursive Functional Programs using Angelic Execution

Arxiv

0+阅读 · 2021年7月13日

Correlation Analysis between the Robustness of Sparse Neural Networks and their Random Hidden Structural Priors

Arxiv

0+阅读 · 2021年7月13日

Latent Transport Models for Multivariate Functional Data

Arxiv

0+阅读 · 2021年7月12日

Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper

Arxiv

0+阅读 · 2021年7月11日

Strong structure recovery for partially observed discrete Markov random fields on graphs

Arxiv

0+阅读 · 2021年7月8日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Heterogeneous Relational Reasoning in Knowledge Graphs with Reinforcement Learning

Heterogeneous Relational Reasoning in Knowledge Graphs with Reinforcement Learning

Arxiv

10+阅读 · 2020年3月12日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

When Truth Discovery Meets Medical Knowledge Graph: Estimating Trustworthiness Degree for Medical Knowledge Condition

Arxiv

4+阅读 · 2018年9月27日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【2020新书】概率机器学习，附212页pdf与slides

【2020新书】概率机器学习，附212页pdf与slides

专知会员服务

111+阅读 · 2020年11月12日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

专知会员服务

26+阅读 · 2019年11月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

已删除

将门创投

7+阅读 · 2018年12月12日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

相关论文

Bottom-up Synthesis of Recursive Functional Programs using Angelic Execution

Bottom-up Synthesis of Recursive Functional Programs using Angelic Execution

Arxiv

0+阅读 · 2021年7月13日

Correlation Analysis between the Robustness of Sparse Neural Networks and their Random Hidden Structural Priors

Arxiv

0+阅读 · 2021年7月13日

Latent Transport Models for Multivariate Functional Data

Arxiv

0+阅读 · 2021年7月12日

Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper

Arxiv

0+阅读 · 2021年7月11日

Strong structure recovery for partially observed discrete Markov random fields on graphs

Arxiv

0+阅读 · 2021年7月8日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Heterogeneous Relational Reasoning in Knowledge Graphs with Reinforcement Learning

Heterogeneous Relational Reasoning in Knowledge Graphs with Reinforcement Learning

Arxiv

10+阅读 · 2020年3月12日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

When Truth Discovery Meets Medical Knowledge Graph: Estimating Trustworthiness Degree for Medical Knowledge Condition

Arxiv

4+阅读 · 2018年9月27日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员