Motivated by uncertain parameters encountered in Markov decision processes (MDPs) and stochastic games, we study the effect of parameter uncertainty on Bellman operator-based algorithms under a set-based framework. Specifically, we first consider a family of MDPs where the cost parameters are in a given compact set; we then define a Bellman operator acting on a set of value functions to produce a new set of value functions as the output under all possible variations in the cost parameter. We prove the existence of a fixed point of this set-based Bellman operator by showing that it is contractive on a complete metric space, and explore its relationship with the corresponding family of MDPs and stochastic games. Additionally, we show that given interval set bounded cost parameters, we can form exact bounds on the set of optimal value functions. Finally, we utilize our results to bound the value function trajectory of a player in a stochastic game.
翻译:基于Markov决策流程(MDPs)和随机游戏中遇到的不确定参数,我们研究了参数不确定性对基于设定框架的Bellman操作者算法的影响。具体地说,我们首先考虑将成本参数纳入一个特定契约集的MDP系列;然后根据一套价值函数定义一个Bellman操作者,以产生一套新的价值函数,作为成本参数所有可能变异下的产出。我们通过显示该设定的Bellman操作者在一个完整的计量空间上具有合同性来证明存在一个固定点,并探索其与MDPs和随机游戏的对应组合的关系。此外,我们展示了给定的固定成本参数间隔,我们可以在一套最佳价值函数上形成精确的界限。最后,我们利用我们的结果将玩家在随机游戏中的价值函数轨迹绑定。