Decentralized stochastic control problems involving general state/measurement/action spaces are intrinsically difficult to study because of the inapplicability of standard tools from centralized (single-agent) stochastic control. In this paper, we address some of these challenges for decentralized stochastic control with standard Borel spaces under two different but tightly related information structures: the one-step delayed information sharing pattern (OSDISP), and the $K$-step periodic information sharing pattern (KSPISP). We will show that the one-step delayed and $K$-step periodic problems can be reduced to a centralized Markov Decision Process (MDP), generalizing prior results which considered finite, linear, or static models, by addressing several measurability and topological questions. We then provide sufficient conditions for the transition kernels of both centralized reductions to be weak-Feller. The existence and separated nature of optimal policies under both information structures are then established. The weak Feller regularity also facilitates rigorous approximation and learning theoretic results, as shown in the paper.
翻译:涉及一般状态/观测/行动空间的去中心化随机控制问题本质上是难以研究的,因为无法应用中心化(单智能体)随机控制中的标准工具。本文针对具有标准Borel空间的去中心化随机控制,在两种不同但紧密相关的信息结构下应对了部分挑战:一步延迟信息共享模式(OSDISP)和K步周期信息共享模式(KSPISP)。我们将证明,通过解决若干可测性与拓扑问题,一步延迟和K步周期问题可被约简为一个中心化马尔可夫决策过程(MDP),从而推广了先前仅考虑有限、线性或静态模型的研究结果。随后,我们为中心化约简的转移核满足弱Feller性质提供了充分条件。基于此,两种信息结构下最优策略的存在性与分离性得以确立。弱Feller正则性亦为严格的逼近与学习理论结果提供了支撑,如本文所示。