Given a policy, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of the SafeZone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SafeZones are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal SafeZones, and show that in general the problem is computationally hard. For this reason we concentrate on computing approximate SafeZones. Our main result is a bi-criteria approximation algorithm which gives a factor of almost $2$ approximation for both the escape probability and SafeZone size, using a polynomial size sample complexity. We conclude the paper with an empirical evaluation of our algorithm.
翻译:根据一项政策,我们把安全区定义为一个州子集,因此该政策的大部分轨道都局限于这一子集。安全区的质量以州数和逃逸概率为参数,即随机轨迹离开该子集的概率为参数。当安全区拥有少数州和低逃逸概率时,安全区特别有趣。我们研究了找到最佳安全区的复杂性,并表明问题一般是计算困难的。为此,我们集中计算接近安全区的近似值。我们的主要结果是一种双标准近似算法,它使用多元体积的样本复杂度,使逃逸概率和安全区大小的近似值几乎达到2美元。我们用对算法的经验评估来完成这份文件。