We introduce a new probabilistic temporal logic for the verification of Markov Decision Processes (MDP). Our logic is the first to include operators for causal reasoning, allowing us to express interventional and counterfactual queries. Given a path formula $\phi$, an interventional property is concerned with the satisfaction probability of $\phi$ if we apply a particular change $I$ to the MDP (e.g., switching to a different policy); a counterfactual allows us to compute, given an observed MDP path $\tau$, what the outcome of $\phi$ would have been had we applied $I$ in the past. For its ability to reason about different configurations of the MDP, our approach represents a departure from existing probabilistic temporal logics that can only reason about a fixed system configuration. From a syntactic viewpoint, we introduce a generalized counterfactual operator that subsumes both interventional and counterfactual probabilities as well as the traditional probabilistic operator found in e.g., PCTL. From a semantics viewpoint, our logic is interpreted over a structural causal model (SCM) translation of the MDP, which gives us a representation amenable to counterfactual reasoning. We provide a proof-of-concept evaluation of our logic on a reach-avoid task in a grid-world model.
翻译:我们引入了用于核实Markov决定过程(MDP)的新的概率时间逻辑。我们的逻辑首先包括了因果推理操作者,允许我们表达干预和反事实的询问。根据一条路径公式$\phe$,干预性财产涉及的是如果我们对MDP应用特定的改变美元(例如,转换到不同的政策);反事实允许我们计算出美元的结果,考虑到人们所观察到的MDP路径$\tau美元,如果过去我们应用的是美元,美元的结果本来会是什么。由于我们能够解释MDP的不同配置,我们的方法偏离了现有的概率时间逻辑,而这只能解释固定系统配置的理由。从合成角度看,我们引入了一个普遍的反事实操作者,从干预和反事实模型的不稳定性,以及从传统的概率操作者(例如,PCTL)中发现,如果我们过去应用了美元,那么美元的结果会是什么。为了解释MDP的不同配置,我们的逻辑是偏离了现有的概率时间逻辑逻辑,这只能说明固定系统配置的理由。从合成的角度,我们引入了一个普遍的反事实主义逻辑操作者,我们从一个真实的逻辑推论中为我们提供了一个稳定的逻辑推论的逻辑,从一个稳定的逻辑推论,从一个稳定的逻辑模型模型模型模型到一个正确的推论,我们提供了一个正确的推。