The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic. While many such logics have been proposed with varying degrees of expressiveness and complexity in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior in general system models. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent and the associated planning problem is faced with significant challenges unless strong restrictions are imposed on the underlying model in terms of the connectivity of its graph structure. In this paper, we explore this steady-state planning problem that consists of deriving a decision-making policy for an agent such that constraints on its steady-state behavior are satisfied. A linear programming solution for the general case of multichain Markov Decision Processes (MDPs) is proposed and we prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.
翻译:规划领域对决策政策的正式综合越来越感兴趣,这种正式综合通常需要找到一种政策,以某种明确界定的逻辑的形式满足正式的规格。虽然许多这种逻辑的提出具有不同程度的清晰度和复杂性,能够捕捉理想的代理人行为,但其价值在制订满足一般系统模式中某些类型的无症状行为的决策政策时是有限的。特别是,我们有兴趣具体说明对代理人稳定状态行为的制约,它能捕捉代理人在每一州与环境进行无限期互动时所花的时间比例。这有时称为代理人的平均或预期行为,而与此相关的规划问题则面临重大挑战,除非对其图形结构的连通性基本模式施加严格的限制。在本文件中,我们探讨这一稳定状态的规划问题,它包括为某一代理人制定决策政策,从而满足对其稳定状态行为的限制。为多链式Markov 决策过程的一般案例提出了一个线性规划解决方案。我们建议了多链式的马克夫决策过程(MDPs)一般情况下的线性规划解决方案。