The classic way of computing a $k$-universal hash function is to use a random degree-$(k-1)$ polynomial over a prime field $\mathbb Z_p$. For a fast computation of the polynomial, the prime $p$ is often chosen as a Mersenne prime $p=2^b-1$. In this paper, we show that there are other nice advantages to using Mersenne primes. Our view is that the hash function's output is a $b$-bit integer that is uniformly distributed in $\{0, \dots, 2^b-1\}$, except that $p$ (the all \texttt1s value in binary) is missing. Uniform bit strings have many nice properties, such as splitting into substrings which gives us two or more hash functions for the cost of one, while preserving strong theoretical qualities. We call this trick "Two for one" hashing, and we demonstrate it on 4-universal hashing in the classic Count Sketch algorithm for second-moment estimation. We also provide a new fast branch-free code for division and modulus with Mersenne primes. Contrasting our analytic work, this code generalizes to any Pseudo-Mersenne primes $p=2^b-c$ for small $c$.
翻译:计算 $k$- 通用散列函数的经典方式是使用随机度( k-1) $\\ mathb + p$ 。 快速计算多元值时, 通常选择美元为 Mersenne Print $p= 2 ⁇ b-1$。 在本文中, 我们显示使用 Mersenne 质素还具有其他优点。 我们的观点是, 散列函数的输出是 $b$- bit 整数, 平均以 $_0, \ dots, 2 ⁇ b-1 $ = 美元分配, 除了缺少 $p$ (二进制值中所有\ textt1 值) $ 。 统一位元字符有许多不错的属性, 例如拆分化成子字符, 给我们两个或两个以上的功能来支付一个成本, 同时保留强大的理论品质。 我们称之为“ 一对一一”, 我们用 4- 通用 显示它在二次移动算法计算 $ 美元 的经典伯爵 算法中以 $ $ $ $ $ $ $ $ 。 我们还提供 将 硬质 的 格式 格式 和 任何普通 格式 格式 。