用于分析查询处理的精确调整数据结构 (Fine-Tuning Data Structures for Analytical Query Processing)

We introduce a framework for automatically choosing data structures to support efficient computation of analytical workloads. Our contributions are twofold. First, we introduce a novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines. This language is designed around the notion of dictionaries, and allows for a more fine-grained choice of its low-level implementation. Second, the cost model for alternative implementations is automatically inferred by combining machine learning and program reasoning. The dictionary cost model is learned using a regression model trained over the profiling dataset of dictionary operations on a given hardware architecture. The program cost model is inferred using static program analysis. Our experimental results show the effectiveness of the trained cost model on micro benchmarks. Furthermore, we show that the performance of the code generated by our framework either outperforms or is on par with the state-of-the-art analytical query engines and a recent in-database machine learning framework.

翻译：我们引入了自动选择数据结构以支持高效计算分析工作量的框架。我们的贡献是双重的。首先, 我们引入了一种新型的低层次中间语言, 能够表达各种查询处理模式背后的算法, 比如古典组合、群列和数据库内的机器学习引擎。这种语言是围绕词典概念设计的, 并允许对其低层次实施做出更精细的选择。其次, 将机器学习与程序推理结合起来, 自动推断替代实施的成本模式。字典成本模型是使用一个比特定硬件结构的字典操作的特征数据集培训的回归模型学习的。程序成本模型是用静态程序分析推断的。我们的实验结果显示了经过培训的成本模型在微观基准上的有效性。此外, 我们展示了我们框架生成的代码的性能, 或者超出或接近于最先进的分析查询引擎和最近的数据库机器学习框架。

相关内容

Machine Learning

关注 2242

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日