We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for a given grammar which can represent straight-line programs with function calls and multiple types. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs
翻译:我们针对的是将两个程序对语句序列的语义等同性证据自动合成的问题。 我们用抽象的语法树( AST) 代表程序, 使用抽象的语法树( AST), 可以在特定的 AST 模式上应用特定的语义保留重写规则, 以产生一个变换和语义等同的程序。 在我们的系统中, 如果存在这些重写规则的应用序列, 导致将一个程序重写成另一个程序, 则两个程序就等同。 我们建议基于一个变压器模型建立一个神经网络结构, 以生成程序对对子的等性证据。 系统输出一个重写序列和序列的有效性, 只需通过核查即可检查它即可。 如果神经网络没有生成一个有效的序列, 系统将程序报告为非等同的, 通过设计, 没有错误地报告任何程序为等同的。 我们的系统被完全用于一个特定语法, 它可以代表直线程序、 功能调和多种类型。 为了高效地培训系统来生成这样的序列, 我们开发一个原始的渐进式培训技术, 命名的重写重写重写重写的顺序, 样选择一个系统, 我们的系统, 我们的精化的精化的精细的样选择。</s>