在部分可遵守的蒙特-卡洛规划中查明意外决定:基于规则的方法 (Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach)

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47\% has been achieved in our tests.

翻译：在这项工作中,我们提出了一个基于满足性Mudlo Theory(SMT)的方法,用于分析POMCP政策,通过检查其跟踪,即该算法产生的信仰-行动-观察三重条款的序列,分析其政策。拟议方法探索了当地的政策行为性质,以查明意外决定。我们提议了一个由三个主要步骤组成的追踪分析迭接过程,即由三个主要步骤组成,i)通过一个参数逻辑公式来界定问题,说明信仰与行动之间的关系(概率性),但妨碍解释性。在这项工作中,我们提出一种基于满足性Modulo Theory(SMTMT)的逻辑公式参数分析方法,通过检查其痕迹,即分析满意性条款的数量(解决一个MAX-SMT问题),分析生成的逻辑公式和相关的决定界限,以确定POMCP就最初问题作出的意外决定。我们提出了追踪性分析过程,我们用一种参数来界定信仰与行动的关系(概率性)之间的逻辑方法,我们用一种标准性逻辑方法来计算人类探测结果(MX-SMTM-MT),在现实问题中,我们用一种实地检验方法来测定了一种标准,在SAR-ROD-ROD结果中可以显示一种标准。