In many applications one wants to identify identical subtrees of a program syntax tree. This identification should ideally be robust to alpha-renaming of the program, but no existing technique has been shown to achieve this with good efficiency (better than $\mathcal{O}(n^2)$ in expression size). We present a new, asymptotically efficient way to hash modulo alpha-equivalence. A key insight of our method is to use a weak (commutative) hash combiner at exactly one point in the construction, which admits an algorithm with $\mathcal{O}(n (\log n)^2)$ time complexity. We prove that the use of the commutative combiner nevertheless yields a strong hash with low collision probability. Numerical benchmarks attest to the asymptotic behaviour of the method.
翻译:在许多应用程序中,人们想要识别程序语法树的相同亚树。 这个标识最好对程序的字母重命名有效, 但是没有显示任何现有技术能够以良好的效率( 比 $\ mathcal{O}( n ⁇ 2) = $好于表达式大小 ) 实现这一目标 。 我们提出了一个新的、 暂时有效的方法, 用于 hash modulo alpha- evalence 。 我们的方法的关键洞察力是,在构建过程中的某个点使用一个弱( comptial) hash 组合器, 它包含一个与 $\ mathcal{O} ( n (\log n)\\\\\ 2) $ 时间复杂性的算法。 我们证明, 使用通货组合器仍然产生强烈的断裂, 碰撞概率低。 数字基准证明了该方法的无符号行为 。