This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. We generalize the Bellman and policy evaluation operators to operators that contract on the space of value functions and denote them as \emph{value operators}. We generalize these value operators to act on the space of value function sets and denote them as \emph{set-based value operators}. We prove that these set-based value operators are contractions in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition for the Bellman operator from classic robust MDP literature to a \emph{containment condition} for a generic value operator, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contractive operators in dynamic programming and reinforcement learning. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own supremum and infimum elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function.
翻译:本文分析基于定值的限定状态 Markov 判定进程( MDPs), 其参数不固定, 并且通过基于固定点的定点理论对稳健的 MDP 进行重新检查。 我们将贝尔曼和政策评价操作员推广到使用价值函数空间的操作员, 并把它们称为 emph{ 价值操作员 。 我们将这些价值操作员推广到价值函数空间上, 并把它们标记为 emph{ 基于定点的值操作员 。 我们证明这些基于定值的操作员是在基于定点的值功能空间中的缩缩缩缩缩缩。 从设定的理论中将贝尔曼操作员的经典稳健健的 MDP 文献放大到 常规操作员的固定值 。 常规操作员的固定值中, 固定的固定值中, 固定的固定值是稳定的 M- 。