We present a formal language for specifying qualitative preferences over temporal goals and a preference-based planning method in stochastic systems. Using automata-theoretic modeling, the proposed specification allows us to express preferences over different sets of outcomes, where each outcome describes a set of temporal sequences of subgoals. We define the value of preference satisfaction given a stochastic process over possible outcomes and develop an algorithm for time-constrained probabilistic planning in labeled Markov decision processes where an agent aims to maximally satisfy its preference formula within a pre-defined finite time duration. We present experimental results using a stochastic gridworld example and discuss possible extensions of the proposed preference model.
翻译:我们提出了一种正式的语言,用于具体说明相对于时间目标的定性偏好,以及在随机系统中采用基于优惠的规划方法。使用自成一体的理论模型,拟议的规格使我们能够表达对不同系列结果的偏好,其中每种结果都描述了一组次级目标的时间序列。我们界定了偏爱的满意度价值,因为相对于可能的结果而言,存在着一种随机过程,而对于标注的Markov决策程序,我们制定了一种有时间限制的概率规划算法,其中代理人的目的是在预定的限定时限内最大限度地满足其优惠公式。我们利用一个随机网格世界的范例来介绍实验结果,并讨论拟议的优惠模式的可能扩展。