We introduce PaCHash, a hash table that stores its objects contiguously in an array without intervening space, even if the objects have variable size. In particular, each object can be compressed using standard compression techniques. A small search data structure allows locating the objects in constant expected time. PaCHash is most naturally described as a static external hash table where it needs a constant number of bits of internal memory per block of external memory. However, PaCHash can be dynamized and is also useful for internal memory, having lower space consumption than all previous approaches even when considering only objects of identical size. For example, in some sense it beats a lower bound on the space consumption of k-perfect hashing. An implementation for fast SSDs needs about 5 bits of internal memory per block of external memory, requires only one disk access (of variable length) per search operation and has internal search overhead small compared to the disk access cost.
翻译:我们引入了PaCHash, 这个散列表, 将物体存放在一个没有干扰空间的阵列内, 即使对象大小不同。 特别是, 每个对象都可以使用标准的压缩技术压缩。 一个小的搜索数据结构可以在固定的预期时间内定位物体。 PaCHash 最自然地被描述为静态的外部散列表, 它需要每个外部内存块的内存比重。 但是, PaCHash 也可以被破坏, 并且对内部内存有用, 空间消耗比以往所有方法都低, 即使只考虑相同大小的物体。 例如, 从某种意义上说, 它比K- perfect 散列的空间消耗要小一些。 快速的 SDDs 需要每个外部内存块大约5 位的内部内存, 每次搜索操作只需要一张磁盘( 变长), 并且与磁盘存取成本相比, 内部搜索顶部小一些。