Learning program semantics from raw source code is challenging due to the complexity of real-world programming language syntax and due to the difficulty of reconstructing long-distance relational information implicitly represented in programs using identifiers. Addressing the first point, we consider Constrained Horn Clauses (CHCs) as a standard representation of program verification problems, providing a simple and programming language-independent syntax. For the second challenge, we explore graph representations of CHCs, and propose a new Relational Hypergraph Neural Network (R-HyGNN) architecture to learn program features. We introduce two different graph representations of CHCs. One is called constraint graph (CG), and emphasizes syntactic information of CHCs by translating the symbols and their relations in CHCs as typed nodes and binary edges, respectively, and constructing the constraints as abstract syntax trees. The second one is called control- and data-flow hypergraph (CDHG), and emphasizes semantic information of CHCs by representing the control and data flow through ternary hyperedges. We then propose a new GNN architecture, R-HyGNN, extending Relational Graph Convolutional Networks, to handle hypergraphs. To evaluate the ability of R-HyGNN to extract semantic information from programs, we use R-HyGNNs to train models on the two graph representations, and on five proxy tasks with increasing difficulty, using benchmarks from CHC-COMP 2021 as training data. The most difficult proxy task requires the model to predict the occurrence of clauses in counter-examples, which subsumes satisfiability of CHCs. CDHG achieves 90.59% accuracy in this task. Furthermore, R-HyGNN has perfect predictions on one of the graphs consisting of more than 290 clauses. Overall, our experiments indicate that R-HyGNN can capture intricate program features for guiding verification problems.
翻译:原始源代码的学习程序语义之所以具有挑战性,是因为真实世界编程语言语法的复杂性,也因为难以重建在使用标识符的程序中隐含的长距离关系信息。 谈到第一点,我们认为Constraced Horn Clales(CHC)是程序核查问题的标准表示, 提供了简单的和编程中依赖语言的语法。 在第二个挑战中, 我们探索CHCC的图形表达方式, 并提议一个新的 Relationalalphraphy Neal 网络( R- HyGNN) 结构来学习程序特性。 我们引入了两个不同的 CHC 的图形表达方式。 一个叫做限制图形(CGGG), 并强调了CHC 的合成信息信息, 将这些符号及其在 CHC 中的关系分别转换为输入节点和二进制语言的边框。 第二个叫做控制与数据流高调(CHGHG), 并且强调 CHCCs 的语义信息,, 代表着控制和数据流流流流流, 。