We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method. We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma. We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
翻译:我们为\ texttt{s}-rectagon 稳健的Markov 决策程序(MDPs)展示了高效的稳健值迭代,其时间复杂性可与标准(非紫外线) MDPs相比,大大快于任何现有方法。我们这样做的方式是,利用我们的$L_p$p$的填水薄膜,以具体的形式从最佳的Bellman操作员中推导出最佳的Bellman操作员。我们展示了最佳政策的确切形式,这最终成为新的门槛政策,有可能采取与其优势相称的行动。