存储式双重动态方案规划的复杂性 (Complexity of Stochastic Dual Dynamic Programming)

Stochastic dual dynamic programming is a cutting plane type algorithm for multi-stage stochastic optimization originated about 30 years ago. In spite of its popularity in practice, there does not exist any analysis on the convergence rates of this method. In this paper, we first establish the number of iterations, i.e., iteration complexity, required by a basic dynamic cutting plane method for solving relatively simple multi-stage optimization problems, by introducing novel mathematical tools including the saturation of search points. We then refine these basic tools and establish the iteration complexity for both deterministic and stochastic dual dynamic programming methods for solving more general multi-stage stochastic optimization problems under the standard stage-wise independence assumption. Our results indicate that the complexity of some deterministic variants of these methods mildly increases with the number of stages $T$, in fact linearly dependent on $T$ for discounted problems. Therefore, they are efficient for strategic decision making which involves a large number of stages, but with a relatively small number of decision variables in each stage. Without explicitly discretizing the state and action spaces, these methods might also be pertinent to the related reinforcement learning and stochastic control areas.

翻译：托盘性双重动态编程是大约30年前开始的多阶段随机优化的剪切型平面算法。尽管在实践中受到欢迎,但对这种方法的趋同率没有任何分析。在本文中,我们首先通过引入包括搜索点饱和在内的新颖数学工具来确定迭代法,从而确定解决相对简单的多阶段优化问题的基本动态切换平面方法所需的迭代数,即迭代复杂性。然后我们完善这些基本工具,并为在标准阶段独立假设下解决更普遍的多阶段随机优化问题的确定型双向动态编程方法确定迭代复杂度。我们的结果显示,这些方法的某些确定型变体的复杂性与阶段数量略有增加($T$),事实上,对折扣问题的线性依赖$T$。因此,这些方法对于涉及许多阶段的战略决策是有效的,但每个阶段的决定变量数量相对较少。在不明确区分国家和行动空间的情况下,这些方法也可能与相关强化学习和控制领域有关。