合并合并和抽样(近近) 近似查询处理最理想 (Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing) - 专知论文

会员服务 ·

0

近似 · 优化器 · Processing（编程语言） · 样本 · 划分 ·

2021 年 3 月 29 日

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

翻译：合并合并和抽样(近近) 近似查询处理最理想

Xi Liang,Stavros Sintos,Zechao Shang,Sanjay Krishnan

Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing solution of combining materialized, pre-computed aggregates with sampling for accurate and more reliable AQP. We explore this solution in detail in this work and propose an AQP physical design called PASS, or Precomputation-Assisted Stratified Sampling. PASS builds a tree of partial aggregates that cover different partitions of the dataset. The leaf nodes of this tree form the strata for stratified samples. Aggregate queries whose predicates align with the partitions (or unions of partitions) are exactly answered with a depth-first search, and any partial overlaps are approximated with the stratified samples. We propose an algorithm for optimally partitioning the data into such a data structure with various practical approximation techniques.

翻译：基于抽样的近似查询处理(AQP)存在许多陷阱,例如无法回答非常有选择的查询,当抽样规模小时信心间隔不可靠。最近的研究提出了一个令人感兴趣的解决办法,即将实际的、预先计算的综合数据与抽样结合,以准确和可靠的AQP。我们在这项工作中详细探讨了这一解决办法,并提议了一个AQP物理设计,称为PASS,或PASS,即Precomplication-Asistication-Asisticed Storent Sampling。PASS建造了一棵包含数据集不同分区的部分聚合物的树。这棵树的叶节点构成分层,作为分层样本的层。综合查询,其前提与分区(或分区结合)完全一致,以深度第一搜索的方式得到准确的回答,任何部分重叠都与分层样本相近。我们提出了一个将数据优化地将数据划入这种数据结构的算法,并采用各种实用的近似技术。

0

相关内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

专知会员服务

45+阅读 · 2020年8月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

已删除

将门创投

8+阅读 · 2017年7月21日

Precise Approximation of Convolutional Neural Networks for Homomorphically Encrypted Data

Arxiv

0+阅读 · 2021年5月25日

ReGiS: Regular Expression Simplification via Rewrite-Guided Synthesis

Arxiv

0+阅读 · 2021年5月24日

A review of approaches to modeling applied vehicle routing problems

Arxiv

0+阅读 · 2021年5月23日

PASOCS: A Parallel Approximate Solver for Probabilistic Logic Programs under the Credal Semantics

Arxiv

0+阅读 · 2021年5月23日

Precise Approximation of Convolutional NeuralNetworks for Homomorphically Encrypted Data

Arxiv

0+阅读 · 2021年5月23日

Efficient closed-form estimation of large spatial autoregressions

Arxiv

0+阅读 · 2021年5月22日

Support Optimality and Adaptive Cuckoo Filters

Arxiv

0+阅读 · 2021年5月22日

Online DR-Submodular Maximization with Stochastic Cumulative Constraints

Arxiv

0+阅读 · 2021年5月21日

Diversity in Kemeny Rank Aggregation: A Parameterized Approach

Arxiv

0+阅读 · 2021年5月19日

Image Retrieval using Heat Diffusion for Deep Feature Aggregation

Arxiv

4+阅读 · 2018年5月22日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

专知会员服务

45+阅读 · 2020年8月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

已删除

将门创投

8+阅读 · 2017年7月21日

相关论文

Precise Approximation of Convolutional Neural Networks for Homomorphically Encrypted Data

Arxiv

0+阅读 · 2021年5月25日

ReGiS: Regular Expression Simplification via Rewrite-Guided Synthesis

Arxiv

0+阅读 · 2021年5月24日

A review of approaches to modeling applied vehicle routing problems

Arxiv

0+阅读 · 2021年5月23日

PASOCS: A Parallel Approximate Solver for Probabilistic Logic Programs under the Credal Semantics

Arxiv

0+阅读 · 2021年5月23日

Precise Approximation of Convolutional NeuralNetworks for Homomorphically Encrypted Data

Arxiv

0+阅读 · 2021年5月23日

Efficient closed-form estimation of large spatial autoregressions

Arxiv

0+阅读 · 2021年5月22日

Support Optimality and Adaptive Cuckoo Filters

Arxiv

0+阅读 · 2021年5月22日

Online DR-Submodular Maximization with Stochastic Cumulative Constraints

Arxiv

0+阅读 · 2021年5月21日

Diversity in Kemeny Rank Aggregation: A Parameterized Approach

Arxiv

0+阅读 · 2021年5月19日

Image Retrieval using Heat Diffusion for Deep Feature Aggregation

Arxiv

4+阅读 · 2018年5月22日

微信扫码咨询专知VIP会员