Shared-nothing architecture has been widely adopted in various commercial distributed RDBMSs. Thanks to the architecture, query can be processed in parallel and accelerated by scaling up the cluster horizontally on demand. In spite of that, load balancing has been a challenging issue in all distributed RDBMSs, including shared-nothing ones, which suffers much from skewed data distribution. In this work, we focus on one of the representative operator, namely Hash Join, and investigate how skewness among the nodes of a cluster will affect the load balance and eventual efficiency of an arbitrary query in shared-nothing RDBMSs. We found that existing Distributed Hash Join (Dist-HJ) solutions may not provide satisfactory performance when a value is skewed in both the probe and build tables. To address that, we propose a novel Dist-HJ solution, namely Partition and Replication (PnR). Although PnR provide the best efficiency in some skewness scenario, our exhaustive experiments over a group of shared-nothing RDBMSs show that there is not a single Dist-HJ solution that wins in all (data skew) scenarios. To this end, we further propose a self-adaptive Dist-HJ solution with a builtin sub-operator cost model that dynamically select the best Dist-HJ implementation strategy at runtime according to the data skew of the target query. We implement the solution in our commercial shared-nothing RDBMSs, namely KaiwuDB (former name ZNBase) and empirical study justifies that the self-adaptive model achieves the best performance comparing to a series of solution adopted in many existing RDBMSs.
翻译:在各种商业分布式的 RDBMS 中广泛采用了共享架构。 由于这个架构, 查询可以平行处理, 并通过按需求横向扩大组群来加速。 尽管如此, 在所有分布式的 RDBMS 中, 包括共享- 无共享的系统中, 包括共享- 无共享的系统, 都因数据分布偏斜而有很大问题。 在这项工作中, 我们集中关注一个具有代表性的操作者之一, 即 Hash joint, 并调查集群节点的偏差将如何影响共享的 RDBMS 中任意查询的负平衡和最终效率。 我们发现, 现有的 分配式 HS- 共享的 Hash 联合( Dist- HJ) 解决方案在探测和构建表格中都存在一个挑战性的问题, 包括共享- 无共享- 共享- 共享- 共享- 共享- RDBMS 的 解决方案。 为了解决这个问题, 我们提议在动态- 运行式的 RDB- RDMS 模式中采用最佳- dead- droad 模式。</s>