Despite being one of the oldest data structures in computer science, hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus many variants offering different trade-offs have been proposed. This paper introduces Iceberg hashing, a hash table that simultaneously offers the strongest known guarantees on a large number of core properties. Iceberg hashing supports constant-time operations while improving on the state of the art for space efficiency, cache efficiency, and low failure probability. Iceberg hashing is also the first hash table to support a load factor of up to $1 - o(1)$ while being stable, meaning that the position where an element is stored only ever changes when resizes occur. In fact, in the setting where keys are $\Theta(\log n)$ bits, the space guarantees that Iceberg hashing offers, namely that is uses at most $\log \binom{|U|}{n} + O(n \log \log n)$ bits to store $n$ items from a universe $U$, matches a lower bound by Demaine et al. that applies to any stable hash table. Iceberg hashing introduces new general-purpose techniques for some of the most basic aspects of hash-table design. Notably, our indirection-free technique for dynamic resizing, which we call waterfall addressing, and our techniques for achieving stability and very-high probability guarantees, can be applied to any hash table that makes use of the front-yard/backyard paradigm for hash table design.
翻译:尽管散列表是计算机科学中最古老的数据结构之一,但散列表仍然是大量理论和实证研究的重点,其中心原因是,人们从散列表所渴望的许多基本属性很难同时实现;因此提出了许多不同权衡的变体。本文介绍的是冰山散列,这是一个同时为大量核心属性提供已知最有力的保障的散列表。冰山散列支持了固定时间操作,同时改善了空间效率、缓存效率和低失灵概率的间接状态。冰山散列也是第一个支持最高为1- o(1)美元的负载因数的概率表,这意味着当变换时,某个元素的存储位置总是发生变化。事实上,在键为$Theta(log n) 位的环境下, 冰山列所提供的空间保证, 也就是在最高级空间模式中用于空间效率、缓冲率和低故障概率。 +欧冰山散列(n) 在基本设计技术上,我们最先变序的变序/变序中, 某些变序的变序表可以用于最低的变式表。