RepV：面向可扩展神经符号规划验证的安全可分离潜在空间 (RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification)

As AI systems migrate to safety-critical domains, verifying that their actions comply with well-defined rules remains a challenge. Formal methods provide provable guarantees but demand hand-crafted temporal-logic specifications, offering limited expressiveness and accessibility. Deep learning approaches enable evaluation of plans against natural-language constraints, yet their opaque decision process invites misclassifications with potentially severe consequences. We introduce RepV, a neurosymbolic verifier that unifies both views by learning a latent space where safe and unsafe plans are linearly separable. Starting from a modest seed set of plans labeled by an off-the-shelf model checker, RepV trains a lightweight projector that embeds each plan, together with a language model-generated rationale, into a low-dimensional space; a frozen linear boundary then verifies compliance for unseen natural-language rules in a single forward pass. Beyond binary classification, RepV provides a probabilistic guarantee on the likelihood of correct verification based on its position in the latent space. This guarantee enables a guarantee-driven refinement of the planner, improving rule compliance without human annotations. Empirical evaluations show that RepV improves compliance prediction accuracy by up to 15% compared to baseline methods while adding fewer than 0.2M parameters. Furthermore, our refinement framework outperforms ordinary fine-tuning baselines across various planning domains. These results show that safety-separable latent spaces offer a scalable, plug-and-play primitive for reliable neurosymbolic plan verification. Code and data are available at: https://repv-project.github.io/.

翻译：随着人工智能系统向安全关键领域迁移，验证其行为是否符合明确定义的规则仍是一项挑战。形式化方法提供了可证明的保证，但需要手工构建时序逻辑规约，其表达能力和可访问性有限。深度学习方法支持基于自然语言约束对规划进行评估，但其不透明的决策过程可能导致误分类，并带来潜在的严重后果。我们提出了RepV，一种神经符号验证器，通过学习一个安全与不安全规划线性可分的潜在空间，将两种视角统一起来。RepV从一个由现成模型检查器标记的少量规划种子集出发，训练一个轻量级投影器，将每个规划及其语言模型生成的原理嵌入到一个低维空间中；随后，一个冻结的线性边界通过单次前向传播即可验证未见过的自然语言规则的合规性。除了二元分类，RepV还基于规划在潜在空间中的位置，为正确验证的可能性提供了概率性保证。这一保证支持对规划器进行保证驱动的精化，从而无需人工标注即可提升规则合规性。实证评估表明，与基线方法相比，RepV将合规性预测准确率提升了高达15%，同时仅增加了不到0.2M参数。此外，我们的精化框架在多种规划领域均优于普通的微调基线。这些结果表明，安全可分离的潜在空间为可靠的神经符号规划验证提供了一种可扩展、即插即用的基础构件。代码和数据可在以下网址获取：https://repv-project.github.io/。