In data-driven systems, data exploration is imperative for making real-time decisions. However, big data is stored in massive databases that are difficult to retrieve. Approximate Query Processing (AQP) is a technique for providing approximate answers to aggregate queries based on a summary of the data (synopsis) that closely replicates the behavior of the actual data, which can be useful where an approximate answer to the queries would be acceptable in a fraction of the real execution time. In this paper, we discuss the use of Generative Adversarial Networks (GANs) for generating tabular data that can be employed in AQP for synopsis construction. We first discuss the challenges associated with constructing synopses in relational databases and then introduce solutions to those challenges. Following that, we organized statistical metrics to evaluate the quality of the generated synopses. We conclude that tabular data complexity makes it difficult for algorithms to understand relational database semantics during training, and improved versions of tabular GANs are capable of constructing synopses to revolutionize data-driven decision-making systems.
翻译:在数据驱动系统中,数据探索是实时决策的必要条件。然而,大数据储存在难以检索的大量数据库中。近似查询处理(AQP)是一种根据数据摘要(概要)为汇总查询提供大致答案的技术,它密切复制了实际数据的行为,如果在实际执行时间的一小部分时间里,对查询的大致答案是可以接受的,则可能有用。在本文中,我们讨论了利用基因反versarial 网络(GANs)生成可用于AQP的表格数据,用于简要构建。我们首先讨论了与在相关数据库中构建合成数据有关的挑战,然后提出了应对这些挑战的解决办法。随后,我们组织了统计指标来评估生成的合成的质量。我们的结论是,表格数据的复杂性使得算法难以在培训期间理解关系数据库的语义性,而改进的表格式GANs能够构建合成合成数据驱动的决策系统革命化。