Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena. However, they are notoriously difficult to solve to optimality, and there exist only a few approximation methods for obtaining high-quality solutions. In this study, grid-based approximations are used in combination with linear programming (LP) models to generate approximate policies for CPOMDPs. A detailed numerical study is conducted with six CPOMDP problem instances considering both their finite and infinite horizon formulations. The quality of approximation algorithms for solving unconstrained POMDP problems is established through a comparative analysis with exact solution methods. Then, the performance of the LP-based CPOMDP solution approaches for varying budget levels is evaluated. Finally, the flexibility of LP-based approaches is demonstrated by applying deterministic policy constraints, and a detailed investigation into their impact on rewards and CPU run time is provided. For most of the finite horizon problems, deterministic policy constraints are found to have little impact on expected reward, but they introduce a significant increase to CPU run time. For infinite horizon problems, the reverse is observed: deterministic policies tend to yield lower expected total rewards than their stochastic counterparts, but the impact of deterministic constraints on CPU run time is negligible in this case. Overall, these results demonstrate that LP models can effectively generate approximate policies for both finite and infinite horizon problems while providing the flexibility to incorporate various additional constraints into the underlying model.
翻译:对部分可见的Markov决策程序(CPOMDPs)进行了详细的数字研究,对六个CPOMDP问题进行了详细研究,其中考虑到其有限和无限的地平线公式。解决不受限制的POMDP问题的近似算法质量是通过精确的解决方案方法进行比较分析确定的。然后,对基于LP的CPOMDP解决方案方法在不同预算水平上的绩效进行评估。最后,基于网格的近似值与线性方案(LP)模型结合使用,为CPOMDPs制定大致政策。详细研究了六个CPOMDP问题实例,其中考虑到其有限和无限的地平线公式公式。发现解决不受限制的POMDP问题近似算法的质量是通过精确的解决方案方法进行比较分析确定的。然后,对基于LPOMDP的解决方案方法在不同预算水平上的业绩进行了评估。最后,基于网格的近似近似方法的灵活性通过运用确定性的政策制约性政策模型来为CPOMDPs制定近似性政策对报酬和CPU运行时间的影响进行详细调查。对于大多数有限的地平线问题的影响不大,但发现对CPU的周期性政策的影响会显著增加时间。对于总的模型来说,而时间则观察到时间的反向后期则看到:在确定性政策上的结果倾向于上的结果往往会显示,而使C的弹性政策结果的结果往往会低到最后的对等同对等。