Stochastic dual dynamic programming is a cutting plane type algorithm for multi-stage stochastic optimization originated about 30 years ago. In spite of its popularity in practice, there does not exist any analysis on the convergence rates of this method. In this paper, we first establish the number of iterations, i.e., iteration complexity, required by a basic dynamic cutting plane method for solving relatively simple multi-stage optimization problems, by introducing novel mathematical tools including the saturation of search points. We then refine these basic tools and establish the iteration complexity for both deterministic and stochastic dual dynamic programming methods for solving more general multi-stage stochastic optimization problems under the standard stage-wise independence assumption. Our results indicate that the complexity of these methods mildly increases with the number of stages $T$, in fact linearly dependent on $T$ for discounted problems. Therefore, they are efficient for strategic decision making which involves a large number of stages, but with a relatively small number of decision variables in each stage. Without explicitly discretizing the state and action spaces, these methods might also be pertinent to the related reinforcement learning and stochastic control areas.
翻译:托盘性双重动态编程是大约30年前开始的多阶段随机优化的剪切机型算法,尽管在实践中很受欢迎,但对这种方法的趋同率没有任何分析。在本文中,我们首先确定迭代数,即迭代复杂性,这是基本动态切换机型方法解决相对简单的多阶段优化问题所需要的,方法是引进新的数学工具,包括搜索点的饱和。然后我们完善这些基本工具,并为确定性和随机性双重动态编程方法确定迭代复杂性,以便在标准阶段独立假设下解决更普遍的多阶段随机优化问题。我们的结果显示,这些方法的复杂性随着阶段的数目增加而略有增加,实际上在折扣问题上依赖$T美元。因此,这些方法对于战略决策来说是有效的,涉及许多阶段,但每个阶段的决策变量数量相对较少。在不明确区分国家和行动空间的情况下,这些方法也可能与相关的强化学习和控制领域有关。