We show subexponential lower bounds (i.e., $2^{\Omega (n^c)}$) on the smoothed complexity of the classical Howard's Policy Iteration algorithm for Markov Decision Processes. The bounds hold for the total reward and the average reward criteria. The constructions are robust in the sense that the subexponential bound holds not only on the average for independent random perturbations of the MDP parameters (transition probabilities and rewards), but for all arbitrary perturbations within an inverse polynomial range. We show also an exponential lower bound on the worst-case complexity for the simple reachability objective.
翻译:我们对古典Howard的Markov决策程序的政策迭代算法的平滑复杂程度,显示了次等低限(即$2 ⁇ Omega(n ⁇ c)$)。总奖赏和平均奖赏标准的界限维持在总奖赏和平均奖赏标准的界限上。这些构造是稳健的,因为亚等奖赏约束不仅对MDP参数(过渡概率和奖赏)的独立随机扰动具有平均价值,而且对反多边范围内的所有任意扰动也具有平均价值。我们也显示了最坏的复杂程度对于简单可达性目标的指数性更低。