Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals. Hashing is the backbone of virtually all load balancing systems. Since the introduction of classic Consistent Hashing, many algorithms have been devised for this purpose. One of the purposes of the load balancer is to ensure storage cluster scalability. It is crucial for the performance of the whole system to transfer as few data records as possible during node addition or removal. The load balancer hashing algorithm has the greatest impact on this process. In this paper we experimentally evaluate several hashing algorithms used for load balancing, conducting both simulated and real system experiments. To evaluate algorithm performance, we have developed a benchmark suite based on Unidata MDM~ -- a scalable toolkit for various Master Data Management (MDM) applications. For assessment, we have employed three criteria~ -- uniformity of the produced distribution, the number of moved records, and computation speed. Following the results of our experiments, we have created a table, in which each algorithm is given an assessment according to the abovementioned criteria.
翻译:现代高负荷应用程序以多个数据库实例存储数据。 这样的架构需要数据一致性, 并且必须确保数据在节点之间均衡分布。 使用负载平衡来实现这些目标。 散列是几乎所有负载平衡系统的主干。 自采用经典的一致散列以来, 已经为此设计了许多算法。 负载平衡器的目的之一是确保存储集束可缩放性。 对整个系统的性能来说, 在节点添加或删除期间尽可能少地传输数据记录至关重要。 负载平衡器散列算法对这一过程具有最大的影响。 在本文中,我们实验性地评估了用于平衡负荷的几种散列算法, 进行模拟和真实的系统实验。 为了评估算法性, 我们开发了一个基于Undata MDM~的基准套件。 这是用于各种总数据管理( MDM) 应用程序的可缩放工具之一。 为了评估, 我们使用了三个标准~ -- 生成的分布的一致性、 移动记录的数量和计算速度。 根据我们的实验结果, 我们制作了一个表格, 每一个算法都按照上述标准进行评估。