The automatic discovery of functional dependencies(FDs) has been widely studied as one of the hardest problems in data profiling. Existing approaches have focused on making the FD computation efficient while inspecting single relations at a time. In this paper, for the first time we address the problem of inferring FDs for multiple relations as they occur in integrated views by solely using the functional dependencies of the base relations of the view itself. To this purpose, we leverage logical inference and selective mining and show that we can discover most of the exact FDs from the base relations and avoid the full computation of the FDs for the integrated view itself, while at the same time preserving the lineage of FDs of base relations. We propose algorithms to speedup the inferred FD discovery process and mine FDs on-the-fly only from necessary data partitions. We present InFine(INferred FunctIoNal dEpendency), an end-to-end solution to discover inferred FDs on integrated views by leveraging provenance information of base relations. Our experiments on a range of real-world and synthetic datasets demonstrate the benefits of our method over existing FD discovery methods that need to rerun the discovery process on the view from scratch and cannot exploit lineage information on the FDs. We show that InFine outperforms traditional methods necessitating the full integrated view computation by one to two order of magnitude in terms of runtime. It is also the most memory efficient method while preserving FD provenance information using mainly inference from base table with negligible execution time.
翻译:自动发现功能依赖性(FDs)是数据剖析方面最困难的问题之一,已广泛研究自动发现功能依赖性(FDs)是数据剖析方面最困难的问题之一,现有方法的重点是使FD在一次检查单一关系时提高计算效率;在本文件中,我们首次在综合观点中通过仅使用该观点基础关系本身的功能依赖性(FDs)来应对多重关系在综合观点中产生推断FDs的问题;为此,我们利用逻辑推论和选择性开采,表明我们能够从基础关系中发现大部分确切的FDs,避免为综合观点本身而充分计算FDs,同时保持基础关系中最小的FDs关系。我们建议采用算法加速推断的FD发现过程和只从必要的数据分割在空中进行。我们介绍Infine(Infered FunctIoNal dependation),一个端到端到端端,通过利用源关系中的源码信息,我们从现实和合成数据剖析两个时间范围进行的实验,同时也利用现有方法显示我们从现有理解方法的正确理解方法的好处。