Hirise 自动集束 (Automatic Clustering in Hyrise) - 专知论文

会员服务 ·

0

簇 · Performer · 列 · Automator · MoDELS ·

2021 年 3 月 29 日

Automatic Clustering in Hyrise

翻译：Hirise 自动集束

Alexander Löser

Physical data layout is an important performance factor for modern databases. Clustering, i.e., storing similar values in proximity, can lead to performance gains in several ways. We present an automated model to determine beneficial clustering columns and a clustering algorithm for the column-oriented, memory-resident database Hyrise. To automatically select clustering columns, the model analyzes the database's workload and provides estimates by how much certain clustering columns would impact the workload's latency. We evaluate the precision of the model's estimates, as well as the overall quality of its clustering suggestions. To apply a determined clustering configuration, we developed an online clustering algorithm. The clustering algorithm supports an arbitrary number of clustering dimensions. We show that the algorithm is robust against concurrently running data modifying queries. We obtain a 5% latency reduction for the TPC-H benchmark when clustering the lineitem table and a 4% latency reduction for the TPC-DS benchmark when clustering the store_sales table.

翻译：物理数据布局是现代数据库的一个重要性能要素。分组, 即将相似值储存在附近, 可以通过几种方式带来绩效收益。我们提出了一个自动模型, 用来确定有益的分组列和为专列、内存- 常住数据库 Hyrise 进行分组算法。要自动选择分组列, 模型分析数据库的工作量, 并估计某些分组列会影响工作量的潜值。我们评估模型估计数的准确性, 以及其组合建议的整体质量。为了应用确定的组合组合配置, 我们开发了一个在线组合算法。组合算法支持任意数量的组合维度。我们显示算法对于同时进行数据修改查询是强有力的。当组合项目表时, 我们获得了TPC- H 基准的5%的延度减少值, 当组合存储- 销售表时, 我们获得了 TPC- DS 基准的4%的延度减少值。

0

相关内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

90+阅读 · 2019年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

已删除

AI掘金志

7+阅读 · 2019年7月8日

Automatic View Selection in Graph Databases

Arxiv

0+阅读 · 2021年5月19日

A level-set based space-time finite element approach to the modelling of solidification and melting processes

Arxiv

0+阅读 · 2021年5月7日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

18+阅读 · 2019年10月30日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

90+阅读 · 2019年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

Deep Research（深度研究）：系统性综述

《革新战术战场空间能力：反无人机系统》报告

【普林斯顿博士论文】用于语音的生成式通用模型

螺旋式开发作为战略资产：美军启示

相关资讯

已删除

AI掘金志

7+阅读 · 2019年7月8日

相关论文

Automatic View Selection in Graph Databases

Arxiv

0+阅读 · 2021年5月19日

A level-set based space-time finite element approach to the modelling of solidification and melting processes

Arxiv

0+阅读 · 2021年5月7日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

18+阅读 · 2019年10月30日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员