The goal of Approximate Query Processing (AQP) is to provide very fast but "accurate enough" results for costly aggregate queries thereby improving user experience in interactive exploration of large datasets. Recently proposed Machine-Learning based AQP techniques can provide very low latency as query execution only involves model inference as compared to traditional query processing on database clusters. However, with increase in the number of filtering predicates(WHERE clauses), the approximation error significantly increases for these methods. Analysts often use queries with a large number of predicates for insights discovery. Thus, maintaining low approximation error is important to prevent analysts from drawing misleading conclusions. In this paper, we propose ELECTRA, a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors. ELECTRA uses a conditional generative model that learns the conditional distribution of the data and at runtime generates a small (~1000 rows) but representative sample, on which the query is executed to compute the approximate result. Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.
翻译:近似查询处理(AQP)的目标是为昂贵的总体询问提供非常快速但“准确”的“足够”的结果,从而改善用户在互动探索大型数据集方面的经验。最近提出的基于机器学习的AQP技术可以提供非常低的延迟度,因为与数据库群的传统查询处理相比,查询执行仅涉及模型推导,而与数据库群的传统查询处理相比,查询只涉及低的延迟度。然而,随着过滤上游(WHERE条款)数量的增加,这些方法的近似差错会大大增加。分析师经常使用大量上游查询(~1000行),但具有代表性的样本,因此,保持低近似差对于防止分析师得出误导性结论非常重要。在本文件中,我们提议使用ELECTRA,即一个具有上游意识的AQP系统,它能回答大量上游查询,其近似误差要小得多。ELTRA使用一个有条件的基因缩写模型,在运行时生成一个小(~1000行)但具有代表性的样本,进行查询是为了测量近似结果。我们用四种不同基线对三个远距的EL数据进行了评估,在三个大基线上提供了低位数据。