We target the problem of provably computing the equivalence between two complex expression trees. To this end, we formalize the problem of equivalence between two such programs as finding a set of semantics-preserving rewrite rules from one into the other, such that after the rewrite the two programs are structurally identical, and therefore trivially equivalent.We then develop a graph-to-sequence neural network system for program equivalence, trained to produce such rewrite sequences from a carefully crafted automatic example generation algorithm. We extensively evaluate our system on a rich multi-type linear algebra expression language, using arbitrary combinations of 100+ graph-rewriting axioms of equivalence. Our machine learning system guarantees correctness for all true negatives, and ensures 0 false positive by design. It outputs via inference a valid proof of equivalence for 93% of the 10,000 equivalent expression pairs isolated for testing, using up to 50-term expressions. In all cases, the validity of the sequence produced and therefore the provable assertion of program equivalence is always computable, in negligible time.
翻译:我们针对两个复杂的表达式树的等同计算问题。 为此, 我们正式确定两个程序之间的等同问题, 比如从一个程序到另一个程序, 找到一套语义保留重写规则, 这样在重写这两个程序后, 两个程序在结构上是相同的, 因而是微不足道的等同的。 然后我们开发一个程序等同的图形到序列神经网络系统, 受过培训, 从精心设计的自动生成的自动示例生成算法中生成这样的重写序列。 我们广泛评估了我们关于一种丰富的多类型线性代数表达语言的系统, 使用100+图形重写等同词的任意组合。 我们的机器学习系统保证了所有真实负数的正确性, 并通过设计确保0个错误的正数。 它通过推断一个有效的证据, 用于测试的10,000对等的表达式的93%, 并且使用高达50年期的表达法。 在所有情况中, 生成的序列的有效性以及因此对程序等同的可辨称总是可以比较的, 在微不足道的时间里。