Cylindrical Algebraic Decomposition (CAD) is a key proof technique for formal verification of cyber-physical systems. CAD is computationally expensive, with worst-case doubly-exponential complexity. Selecting an optimal variable ordering is paramount to efficient use of CAD. Prior work has demonstrated that machine learning can be useful in determining efficient variable orderings. Much of this work has been driven by CAD problems extracted from applications of the MetiTarski theorem prover. In this paper, we revisit this prior work and consider issues of bias in existing training and test data. We observe that the classical MetiTarski benchmarks are heavily biased towards particular variable orderings. To address this, we apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias. Furthermore, we evaluate issues of information leakage, and test the generalizability of our models on the new dataset.
翻译:Cylindrical Algebraic Decomposition (CAD) 是正式验证网络物理系统的关键证明技术。 CAD 计算成本昂贵, 且具有最坏的双倍耗尽性复杂性。 选择最佳变量顺序对于有效使用 CAD 至关重要。 先前的工作表明, 机器学习对于确定高效变量顺序是有用的。 大部分这项工作是由从 MetiTarski 理论验证器的应用中提取的 CAD 问题驱动的。 在本文中, 我们重新审视了先前的这项工作, 并考虑了现有培训和测试数据中的偏差问题。 我们观察到古典的MetiTarski 基准严重偏向于特定的变量排序。 为了解决这个问题, 我们应用对称法来创建一个包含超过41K MetiTarski 挑战的新数据集, 目的是消除偏差。 此外, 我们评估信息泄漏问题, 测试新数据集中我们模型的通用性 。</s>