Applying machine learning to mathematical terms and formulas requires a suitable representation of formulas that is adequate for AI methods. In this paper, we develop an encoding that allows for logical properties to be preserved and is additionally reversible. This means that the tree shape of a formula including all symbols can be reconstructed from the dense vector representation. We do that by training two decoders: one that extracts the top symbol of the tree and one that extracts embedding vectors of subtrees. The syntactic and semantic logical properties that we aim to reserve include both structural formula properties, applicability of natural deduction steps, and even more complex operations like unifiability. We propose datasets that can be used to train these syntactic and semantic properties. We evaluate the viability of the developed encoding across the proposed datasets as well as for the practical theorem proving problem of premise selection in the Mizar corpus.
翻译:应用机器来学习数学术语和公式需要适当表达适合AI 方法的公式。 在本文中, 我们开发了一个编码, 允许保存逻辑属性, 并且可以反转。 这意味着包含所有符号的公式的树形可以从密度矢量表示中重建。 我们这样做的方法是训练两个解码器: 一个解码器可以提取树的顶部符号, 一个解码器可以提取子树的嵌入矢量。 我们打算保留的合成和语义逻辑属性包括结构公式属性、 自然扣减步骤的适用性, 以及更复杂的操作( 如不可核实性) 。 我们提出数据集可以用来训练这些合成和语义特性。 我们评估了在拟议数据集中开发的编码的可行性, 以及用于证明 Mizacamp 中前置选择问题的实际理论。