Functional Dependencies (FDs) define attribute relationships based on syntactic equality, and, when usedin data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies(OFDs), which express semantic attribute relationships such as synonyms and is-a hierarchies defined by an ontology. We study the theoretical foundations for OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Towards enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets, and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional FDs.
翻译:功能依赖(FDs) 定义基于同系物平等的属性关系,在使用数据清理时,它们错误地将同系物不同但等同的值贴上错误的标签。我们探索与本体功能依赖性(OFDs)一起进行基于依赖性的数据清理,以表达同义词和本体学界定的等级等语义属性关系;我们研究D的理论基础,包括健全和完整的轴数和线性时间推论程序。我们然后提出一种算法,用于发现数据(除某些例外情况外,持有的)与使用等义词来提取搜索空间的数据的相容性数据。我们设法使OFDs成为实践中的数据质量规则,我们研究找到对一系列ODs关系和理论进行最起码的修复的问题。我们展示了我们在真实数据集方面的技术的有效性,并表明ODs可以大大减少依赖传统FDs的数据清理技术中错误的数量。