Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena. However, they are notoriously difficult to solve to optimality, and there exist only a few approximation methods for obtaining high-quality solutions. In this study, we use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs. We consider five CPOMDP problem instances and conduct a detailed numerical study of both their finite and infinite horizon formulations. We first establish the quality of the approximate unconstrained POMDP policies through a comparative analysis with exact solution methods. We then show the performance of the LP-based CPOMDP solution approaches for varying budget levels (i.e., cost limits) for different problem instances. Finally, we show the flexibility of LP-based approaches by applying deterministic policy constraints, and investigate the impact that these constraints have on collected rewards and CPU run time. Our analysis demonstrates that LP models can effectively generate approximate policies for both finite and infinite horizon problems, while providing the flexibility to incorporate various additional constraints into the underlying model.
翻译:对部分可观察的Markov 决策程序(CPOMDPs)进行了严格的模拟,以模拟各种现实世界现象,然而,这些进程很难找到最佳的解决办法,而且只有几种近似方法来获得高质量的解决办法。在本研究中,我们使用基于网格的近似值,结合线性方案编制模型,为CPOMDPs制定大致的政策。我们考虑了五种CPOMDP问题实例,对其有限的和无限的地平线配方进行详细的数字研究。我们首先通过用精确的解决方案方法进行比较分析,确定大约未受限制的POMDP政策的质量。然后,我们展示基于LP的CPOMDP解决方案方法在不同预算水平(即成本限制)方面的绩效。最后,我们通过运用确定性的政策限制,展示基于网格的近近似度方法的灵活性,并调查这些制约因素对所收集的奖赏和CPU运行时间的影响。我们的分析表明,LP模型能够有效地为有限和无限的地平线问题制定大致政策,同时提供灵活性,将各种额外限制纳入基本模式。