We propose a framework for learning a fragment of probabilistic computation tree logic (pCTL) formulae from a set of states that are labeled as safe or unsafe. We work in a relational setting and combine ideas from relational Markov Decision Processes with pCTL model-checking. More specifically, we assume that there is an unknown relational pCTL target formula that is satisfied by only safe states, and has a horizon of maximum $k$ steps and a threshold probability $\alpha$. The task then consists of learning this unknown formula from states that are labeled as safe or unsafe by a domain expert. We apply principles of relational learning to induce a pCTL formula that is satisfied by all safe states and none of the unsafe ones. This formula can then be used as a safety specification for this domain, so that the system can avoid getting into dangerous situations in future. Following relational learning principles, we introduce a candidate formula generation process, as well as a method for deciding which candidate formula is a satisfactory specification for the given labeled states. The cases where the expert knows and does not know the system policy are treated, however, much of the learning process is the same for both cases. We evaluate our approach on a synthetic relational domain.
翻译:我们建议了一个框架,从一组被贴上安全或不安全标签的国家中学习概率计算树逻辑(pCTL)公式的碎片。 我们在一个关系设置中工作,并将关系Markov决定程序的想法与pCTL模式检查结合起来。 更具体地说, 我们假设,有一个仅由安全国家满足的未知关系PCTL目标公式, 其范围为最高方块, 并有一个最高方块和阈值概率 $\ alpha$。 然后, 任务包括从被域专家贴上安全或不安全标签的国家中学习这一未知的公式。 我们采用关系学习原则, 来引导一个所有安全国家都满足的PCTL公式。 然后, 这个公式可以用作这个领域的安全规格, 以避免未来陷入危险局面。 根据关系学习原则, 我们引入一个候选公式生成程序, 以及一个方法, 用来决定哪个候选公式是给定标签的州一个令人满意的规格。 我们专家知道并且不知道系统政策的案例, 但是, 合成域关系的方法也是我们之间学习方法的相同。