混合物质化在基于磁盘的列式存储中的应用 (Hybrid Materialization in a Disk-Based Column-Store) - 专知论文

会员服务 ·

0

列式存储 · 混合 · 存储 · 操作 · 列存储 ·

2023 年 4 月 17 日

Hybrid Materialization in a Disk-Based Column-Store

翻译：混合物质化在基于磁盘的列式存储中的应用

Evgeniy Klyuchikov,Elena Mikhailova,George Chernishev

In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query plans, and, therefore, impacts the overall system performance. In this paper we continue investigating materialization strategies for a distributed disk-based column-store. We start with demonstrating cases when existing approaches impose fundamental limitations on the resulting system performance. Then, in order to address them, we propose a new hybrid materialization model. The main feature of hybrid materialization is the ability to manipulate both positions and values at the same time. This way, query engine can flexibly combine advantages of all the existing strategies and support a new class of query plans. Moreover, hybrid materialization allows the query engine to flexibly customize the materialization policy of individual attributes. We describe our vision of how hybrid materialization can be implemented in a columnar system. As an example, we use PosDB~ -- a distributed, disk-based column-store. We present necessary data structures, the internals of a hybrid operator, and describe the algebra of such operators. Based on this implementation, we evaluate performance of late, ultra-late, and hybrid materialization strategies in several scenarios based on TPC-H queries. Our experiments demonstrate that hybrid materialization is almost two times faster than its counterparts, while providing a more flexible query model.

翻译：在列式查询处理中，物质化策略确定何时将轻量级位置（行ID）转换为元组。它是列存储架构的重要组成部分，因为它定义了支持的查询计划类别，从而影响了整个系统的性能。在本文中，我们继续研究基于磁盘的分布式列式存储的物质化策略。我们首先展示了现有方法在导致系统性能方面存在根本限制的情况。为了解决这些问题，我们提出了一种新的混合物质化模型。混合物化的主要特点是能够同时操作位置和值。这样，查询引擎可以灵活地组合所有现有策略的优点，并支持一组新的查询计划。此外，混合物质化允许查询引擎灵活定制单个属性的物质化策略。我们描述了混合物质化如何在一个列式系统中实现。作为示例，我们使用了PosDB——一个分布式、基于磁盘的列式存储系统。我们提供了必要的数据结构、混合操作符的内部结构，并描述了这种操作符的代数。基于该实现，我们评估了晚期、超晚期和混合物质化策略在基于TPC-H查询的几种情况下的性能。我们的实验表明，混合物质化速度几乎比其他方法快两倍，同时提供了更灵活的查询模型。

0

相关内容

列式存储

列存储，缩写为DSM，相对于NSM(N-ary storage model)，其主要区别在于，DSM将所有记录中相同字段的数据聚合存储。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Flutter 组件: Autocomplete 自动填充 | 开发者说·DTalk

Flutter 组件: Autocomplete 自动填充 | 开发者说·DTalk

谷歌开发者

0+阅读 · 2022年10月28日

10 个数据分析师必须知道的 SQL 查询语法

10 个数据分析师必须知道的 SQL 查询语法

CSDN

0+阅读 · 2022年9月13日

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

CSDN

0+阅读 · 2022年9月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

集群环境下内存空间数据库管理与查询技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

云计算环境下海量XML数据管理关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

模糊时空数据XML建模与查询关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

云计算Hadoop框架中高效迭代机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Hadoop的分布式并行联机分析处理技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

Groebner 基计算的新理论和快速算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

可扩展的高效XML数据管理关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向大规模RDF数据的分布式处理技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于云计算环境的TB/PB级海量数据查询处理技术的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning in Encrypted Traffic Classification

Arxiv

0+阅读 · 2023年6月3日

An OPC UA-based industrial Big Data architecture

Arxiv

0+阅读 · 2023年6月2日

Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials

Arxiv

0+阅读 · 2023年6月2日

Recent Advances in Graph-based Machine Learning for Applications in Smart Urban Transportation Systems

Arxiv

0+阅读 · 2023年6月2日

Optimal Path Planning in Distinct Topo-Geometric Classes using Neighborhood-augmented Graph and its Application to Path Planning for a Tethered Robot in 3D

Arxiv

0+阅读 · 2023年6月1日

GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles

Arxiv

0+阅读 · 2023年6月1日

Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability

Arxiv

0+阅读 · 2023年5月31日

Graph Ordering Attention Networks

Arxiv

12+阅读 · 2022年11月21日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Flutter 组件: Autocomplete 自动填充 | 开发者说·DTalk

Flutter 组件: Autocomplete 自动填充 | 开发者说·DTalk

谷歌开发者

0+阅读 · 2022年10月28日

10 个数据分析师必须知道的 SQL 查询语法

10 个数据分析师必须知道的 SQL 查询语法

CSDN

0+阅读 · 2022年9月13日

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

CSDN

0+阅读 · 2022年9月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning in Encrypted Traffic Classification

Arxiv

0+阅读 · 2023年6月3日

An OPC UA-based industrial Big Data architecture

Arxiv

0+阅读 · 2023年6月2日

Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials

Arxiv

0+阅读 · 2023年6月2日

Recent Advances in Graph-based Machine Learning for Applications in Smart Urban Transportation Systems

Arxiv

0+阅读 · 2023年6月2日

Optimal Path Planning in Distinct Topo-Geometric Classes using Neighborhood-augmented Graph and its Application to Path Planning for a Tethered Robot in 3D

Arxiv

0+阅读 · 2023年6月1日

GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles

Arxiv

0+阅读 · 2023年6月1日

Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability

Arxiv

0+阅读 · 2023年5月31日

Graph Ordering Attention Networks

Arxiv

12+阅读 · 2022年11月21日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

相关基金

集群环境下内存空间数据库管理与查询技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

云计算环境下海量XML数据管理关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

模糊时空数据XML建模与查询关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

云计算Hadoop框架中高效迭代机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Hadoop的分布式并行联机分析处理技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

Groebner 基计算的新理论和快速算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

可扩展的高效XML数据管理关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向大规模RDF数据的分布式处理技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于云计算环境的TB/PB级海量数据查询处理技术的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员