We consider qualitative strategy synthesis for the formalism called consumption Markov decision processes. This formalism can model dynamics of an agents that operates under resource constraints in a stochastic environment. The presented algorithms work in time polynomial with respect to the representation of the model and they synthesize strategies ensuring that a given set of goal states will be reached (once or infinitely many times) with probability 1 without resource exhaustion. In particular, when the amount of resource becomes too low to safely continue in the mission, the strategy changes course of the agent towards one of a designated set of reload states where the agent replenishes the resource to full capacity; with sufficient amount of resource, the agent attempts to fulfill the mission again. We also present two heuristics that attempt to reduce expected time that the agent needs to fulfill the given mission, a parameter important in practical planning. The presented algorithms were implemented and numerical examples demonstrate (i) the effectiveness (in terms of computation time) of the planning approach based on consumption Markov decision processes and (ii) the positive impact of the two heuristics on planning in a realistic example.
翻译:我们考虑对所谓消费Markov决策程序的形式主义进行定性战略综合。这种形式主义可以模拟在资源限制下在随机环境中运作的代理人的动态。所提出的算法在模型代表方面在时间上的多式运作,它们综合了各种战略,以确保在不耗尽资源的情况下达到特定的目标状态(一旦或无限多次),概率1,而没有耗尽资源。特别是,当资源量太低,以致无法在任务中安全地继续下去时,该代理人的路线将改变为指定的一组再加载状态之一,即该代理人将资源补充到全部能力;用足够数量的资源,该代理人试图再次完成这项任务。我们还提出了两条推论,试图缩短该代理人完成特定任务所需的预期时间,这是实际规划中的一个重要参数。所提出的算法得到执行,并以数字实例表明:(一) 以消费Markov决策过程为基础的规划方法的有效性(计算时间),以及(二) 两种高调对规划的积极影响。